I'm trying to scrape all comments from a subreddit. I've found a library called PRAW. It gives an example
import praw
r = praw.Reddit('Comment parser example by u/_Daimon_')
subreddit = r.get_subreddit("python")
comments = subreddit.get_comments()
However, this returns only the most recent 25 comments. How can I parse all comments in the subreddit? On the Reddit interface, there's a next
button, so it should be possible to go back in history page by page.
From the docs:
See UnauthenticatedReddit.get_comments() for complete usage.
That function has *args and **kwargs, and the function notes:
The additional parameters are passed directly into get_content(). Note: the url parameter cannot be altered.
Therefore, I looked at that function (find it here). One of the arguments for get_content
is limit.
limit – the number of content entries to fetch. If limit <= 0, fetch the default for your account (25 for unauthenticated users). If limit is None, then fetch as many entries as possible (reddit returns at most 100 per request, however, PRAW will automatically make additional requests as necessary).
(Emphasis added). So my test was:
comments=subreddit.get_comments(limit=None)
And I got 30+ comments (probably the 100 limit, but I had to go through them manually, so I thought 30 was enough).