I pulled the max amount of tweets allowed from USATODAY which was 3000.
Now I want to create a script to automatically pull USATODAY's tweets at 11:59PM of every day.
I was going to use the stream api but then I will have to keep it running the whole day.
Can I get some insight on how to create a script where it runs the REST API every night at 11:59PM to pull the day's tweets? If not does anyone know how to pull tweets based on date?
I was thinking about placing an ifelse statement in my for loop but that seems inefficient, because it will have to search through 3000 tweets every night.
This is what I have now:
client = MongoClient('localhost', 27017)
db = client['twitter_db']
collection = db['usa_collection']
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token_key, access_token_secret)
api = tweepy.API(auth)
for tweet in tweepy.Cursor(api.user_timeline,id='USATODAY').items():
collection.insert(tweet._json)
You can simply retrieve the tweets with the help of pages, Now on each page received you iterate over the tweets and extract the creation time of that tweet which is accessed using tweet.created_at
and the you find the difference between the extracted date and the current date, if the difference is less than 1 day then it is a favourable tweet else you just exit out of the loop.
import tweepy, datetime, time
def get_tweets(api, username):
page = 1
deadend = False
while True:
tweets = api.user_timeline(username, page = page)
for tweet in tweets:
if (datetime.datetime.now() - tweet.created_at).days < 1:
#Do processing here:
print tweet.text.encode("utf-8")
else:
deadend = True
return
if not deadend:
page+=1
time.sleep(500)
get_tweets(api, "anmoluppal366")
Note: you are not accessing all 3000 tweets of that person, you only iterate over those tweets which were created within the span of 24 hours at the time of launching your application.