How do I return more than 100 Twitter search results with Twython?

Clay picture Clay · Jan 10, 2014 · Viewed 7.2k times · Source

Twitter only returns 100 tweets per "page" when returning search results on the API. They provide the max_id and since_id in the returned search_metadata that can be used as parameters to get earlier/later tweets.

Twython 3.1.2 documentation suggests that this pattern is the "old way" to search:

results = twitter.search(q="xbox",count=423,max_id=421482533256044543)
for tweet in results['statuses']:
    ... do something

and that this is the "new way":

results = twitter.cursor(t.search,q='xbox',count=375)
for tweet in results:
    ... do something

When I do the latter, it appears to endlessly iterate over the same search results. I'm trying to push them to a CSV file, but it pushes a ton of duplicates.

What is the proper way to search for a large number of tweets, with Twython, and iterate through the set of unique results?

Edit: Another issue here is that when I try to iterate with the generator (for tweet in results:), it loops repeatedly, without stopping. Ah -- this is a bug... https://github.com/ryanmcgrath/twython/issues/300

Answer

Eugene picture Eugene · Jan 13, 2014

I had the same problem, but it seems that you should just loop through a user's timeline in batches using the max_id parameter. The batches should be 100 as per Terence's answer (but actually, for user_timeline 200 is the max count), and just set the max_id to the last id in the previous set of returned tweets minus one (because max_id is inclusive). Here's the code:

'''
Get all tweets from a given user.
Batch size of 200 is the max for user_timeline.
'''
from twython import Twython, TwythonError
tweets = []
# Requires Authentication as of Twitter API v1.1
twitter = Twython(PUT YOUR TWITTER KEYS HERE!)
try:
    user_timeline = twitter.get_user_timeline(screen_name='eugenebann',count=200)
except TwythonError as e:
    print e
print len(user_timeline)
for tweet in user_timeline:
    # Add whatever you want from the tweet, here we just add the text
    tweets.append(tweet['text'])
# Count could be less than 200, see:
# https://dev.twitter.com/discussions/7513
while len(user_timeline) != 0: 
    try:
        user_timeline = twitter.get_user_timeline(screen_name='eugenebann',count=200,max_id=user_timeline[len(user_timeline)-1]['id']-1)
    except TwythonError as e:
        print e
    print len(user_timeline)
    for tweet in user_timeline:
        # Add whatever you want from the tweet, here we just add the text
        tweets.append(tweet['text'])
# Number of tweets the user has made
print len(tweets)