How to scrape all comments from a subreddit on Reddit?

siamii picture siamii · Jun 28, 2015 · Viewed 8.8k times · Source

I'm trying to scrape all comments from a subreddit. I've found a library called PRAW. It gives an example

import praw
r = praw.Reddit('Comment parser example by u/_Daimon_')
subreddit = r.get_subreddit("python")
comments = subreddit.get_comments()

However, this returns only the most recent 25 comments. How can I parse all comments in the subreddit? On the Reddit interface, there's a next button, so it should be possible to go back in history page by page.

Answer

IronManMark20 picture IronManMark20 · Jun 28, 2015

From the docs:

See UnauthenticatedReddit.get_comments() for complete usage.

That function has *args and **kwargs, and the function notes:

The additional parameters are passed directly into get_content(). Note: the url parameter cannot be altered.

Therefore, I looked at that function (find it here). One of the arguments for get_content is limit.

limit – the number of content entries to fetch. If limit <= 0, fetch the default for your account (25 for unauthenticated users). If limit is None, then fetch as many entries as possible (reddit returns at most 100 per request, however, PRAW will automatically make additional requests as necessary).

(Emphasis added). So my test was:

 comments=subreddit.get_comments(limit=None)

And I got 30+ comments (probably the 100 limit, but I had to go through them manually, so I thought 30 was enough).