I'm hoping to track tweets that contain a certain set of words, but not others. For example, if my filter is: "taco" AND ("chicken" OR "beef").
It should return these tweets:
-I am eating a chicken taco.
-I am eating a beef taco.
It should not return these tweets:
-I am eating a taco.
-I am eating a pork taco.
Here is the code I'm currently running:
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import time
import json
# authentication data- get this info from twitter after you create your application
ckey = '...' # consumer key, AKA API key
csecret = '...' # consumer secret, AKA API secret
atoken = '...' # access token
asecret = '...' # access secret
# define listener class
class listener(StreamListener):
def on_data(self, data):
try:
print data # write the whole tweet to terminal
return True
except BaseException, e:
print 'failed on data, ', str(e) # if there is an error, show what it is
time.sleep(5) # one error could be that you're rate-limited; this will cause the script to pause for 5 seconds
def on_error(self, status):
print status
# authenticate yourself
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["taco"]) # track what you want to search for!
The last line of the code is the part I'm struggling with; if I use:
twitterStream.filter(track=["taco","chicken","beef"])
it will return all tweets containing any of the three words. Other things I've tried, such as:
twitterStream.filter(track=(["taco"&&("chicken","beef")])
return a syntax error.
I'm fairly new to both Python and Tweepy. Both this and this seem like similar queries, but they are related to tracking multiple terms simultaneously, rather than tracking a subset of tweets containing a term. I haven't been able to find anything in the tweepy documentation.
I know another option would be tracking all tweets containing "taco" then filtering by "chicken" or "beef" into my database, but I'm worried about running up against the 1% streaming rate limit if I do a general search and then filter it down within Python, so I'd prefer only streaming the terms I want in the first place from Twitter.
Thanks in advance-
Sam
Twitter does not allow you to be very precise in how keywords are matched. However, the track parameter documentation states that spaces within a keyword are equivelent to logicals ANDS. All of the terms you specify are OR'd together.
So, to achieve your "taco" AND ("chicken" OR "beef")
example, you could try the parameters [taco chicken
, taco beef
]. This would match tweets containing the words taco
and chicken
, or taco
and beef
. However, this isn't a perfect solution, as a tweet containing taco
, chicken
, and beef
would also be matched.