How to scrape all comments from a subreddit on Reddit?

Question 1

How to scrape all comments from a subreddit on Reddit?

python reddit praw

siamii · Jun 28, 2015 · Viewed 8.8k times · Source

Answer

Answer

From the docs:

See UnauthenticatedReddit.get_comments() for complete usage.

That function has *args and **kwargs, and the function notes:

The additional parameters are passed directly into get_content(). Note: the url parameter cannot be altered.

Therefore, I looked at that function (find it here). One of the arguments for get_content is limit.

limit – the number of content entries to fetch. If limit <= 0, fetch the default for your account (25 for unauthenticated users). If limit is None, then fetch as many entries as possible (reddit returns at most 100 per request, however, PRAW will automatically make additional requests as necessary).

(Emphasis added). So my test was:

 comments=subreddit.get_comments(limit=None)

And I got 30+ comments (probably the 100 limit, but I had to go through them manually, so I thought 30 was enough).

Question 2

I'm trying to scrape all comments from a subreddit. I've found a library called PRAW. It gives an example

import praw
r = praw.Reddit('Comment parser example by u/_Daimon_')
subreddit = r.get_subreddit("python")
comments = subreddit.get_comments()

However, this returns only the most recent 25 comments. How can I parse all comments in the subreddit? On the Reddit interface, there's a next button, so it should be possible to go back in history page by page.

How to scrape all comments from a subreddit on Reddit?

Answer

Related questions