Twitter only returns 100 tweets per "page" when returning search results on the API. They provide the max_id
and since_id
in the returned search_metadata
that can be used as parameters to get earlier/later tweets.
Twython 3.1.2 documentation suggests that this pattern is the "old way" to search:
results = twitter.search(q="xbox",count=423,max_id=421482533256044543)
for tweet in results['statuses']:
... do something
and that this is the "new way":
results = twitter.cursor(t.search,q='xbox',count=375)
for tweet in results:
... do something
When I do the latter, it appears to endlessly iterate over the same search results. I'm trying to push them to a CSV file, but it pushes a ton of duplicates.
What is the proper way to search for a large number of tweets, with Twython, and iterate through the set of unique results?
Edit: Another issue here is that when I try to iterate with the generator (for tweet in results:
), it loops repeatedly, without stopping. Ah -- this is a bug... https://github.com/ryanmcgrath/twython/issues/300
I had the same problem, but it seems that you should just loop through a user's timeline in batches using the max_id parameter. The batches should be 100 as per Terence's answer (but actually, for user_timeline 200 is the max count), and just set the max_id to the last id in the previous set of returned tweets minus one (because max_id is inclusive). Here's the code:
'''
Get all tweets from a given user.
Batch size of 200 is the max for user_timeline.
'''
from twython import Twython, TwythonError
tweets = []
# Requires Authentication as of Twitter API v1.1
twitter = Twython(PUT YOUR TWITTER KEYS HERE!)
try:
user_timeline = twitter.get_user_timeline(screen_name='eugenebann',count=200)
except TwythonError as e:
print e
print len(user_timeline)
for tweet in user_timeline:
# Add whatever you want from the tweet, here we just add the text
tweets.append(tweet['text'])
# Count could be less than 200, see:
# https://dev.twitter.com/discussions/7513
while len(user_timeline) != 0:
try:
user_timeline = twitter.get_user_timeline(screen_name='eugenebann',count=200,max_id=user_timeline[len(user_timeline)-1]['id']-1)
except TwythonError as e:
print e
print len(user_timeline)
for tweet in user_timeline:
# Add whatever you want from the tweet, here we just add the text
tweets.append(tweet['text'])
# Number of tweets the user has made
print len(tweets)