I'm looking to have the Tweepy Streaming API stop pulling in tweets after I have stored x # of tweets in MongoDB.
I have tried IF and WHILE statements inside the class, defintion with counters, but cannot get it to stop at a certain X amount. This is a real head-banger for me. I found this link here: https://groups.google.com/forum/#!topic/tweepy/5IGlu2Qiug4 but my efforts to replicate this have failed. It always tells me that init needs an additional argument. I believe we have our Tweepy auth set different, so it is not apples to apples.
Any thoughts?
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json, time, sys
import tweepy
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(OAUTH_TOKEN, OAUTH_TOKEN_SECRET)
class StdOutListener(StreamListener):
def on_status(self, status):
text = status.text
created = status.created_at
record = {'Text': text, 'Created At': created}
print record #See Tweepy documentation to learn how to access other fields
collection.insert(record)
def on_error(self, status):
print 'Error on status', status
def on_limit(self, status):
print 'Limit threshold exceeded', status
def on_timeout(self, status):
print 'Stream disconnected; continuing...'
stream = Stream(auth, StdOutListener())
stream.filter(track=['tv'])
You need to add a counter inside of your class in __init__
, and then increment it inside of on_status
. Then when the counter is below 20 it will insert a record into the collection. This could be done as show below:
def __init__(self, api=None):
super(StdOutListener, self).__init__()
self.num_tweets = 0
def on_status(self, status):
record = {'Text': status.text, 'Created At': status.created_at}
print record #See Tweepy documentation to learn how to access other fields
self.num_tweets += 1
if self.num_tweets < 20:
collection.insert(record)
return True
else:
return False