I am a datajournalist and I am trying to scrape all the comments of Xvideos, so it gets easier to find victims of leaked personal videos. I have the following code in R, but I can't go on, because I don't know how to click the button "comment" or how to change the url to show the comments by default. Could you give a hand? Thank you.
library(tidyverse)
library(rvest)
url <- "https://www.xvideos.com/new/1"
links <- url %>%
read_html() %>%
html_nodes("a") %>%
html_attr("href") %>%
as.data.frame() %>%
`colnames<-`("link") %>%
filter(str_detect(link, "/video"))
I'm not sure why necessarily to use R for this, I would much rather suggest selenium framework to work with for such a workload. This is javascript that does an XHR so it will not be parsable with read html as it will not execute the site code.
But nonetheless you can also reverse engineer the requests - if you want to work with R here is a solution concept that will work:
You get a list of the videos with your code so you should have URLs like this:
https://de.xvideos.com/video52314867/...
You can use a regular Expression like \/video(\d+)\/
to get the ID from there and then request the comment URL:
POST https://de.xvideos.com/threads/video-comments/get-posts/top/52314867/0/0
I guess you can see where the ID belongs... this way you will get the video comments as responses directly without executing Javascript.