Logical Operators in Tweepy Filter

Sam Zipper picture Sam Zipper · Mar 12, 2014 · Viewed 7.6k times · Source

I'm hoping to track tweets that contain a certain set of words, but not others. For example, if my filter is: "taco" AND ("chicken" OR "beef").

It should return these tweets:

-I am eating a chicken taco.
-I am eating a beef taco.

It should not return these tweets:

-I am eating a taco.
-I am eating a pork taco.

Here is the code I'm currently running:

from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import time
import json

# authentication data- get this info from twitter after you create your application
ckey = '...'                # consumer key, AKA API key
csecret = '...'             # consumer secret, AKA API secret
atoken = '...'   # access token
asecret = '...'     # access secret

# define listener class
class listener(StreamListener): 

    def on_data(self, data):
        try:
            print data   # write the whole tweet to terminal
            return True
        except BaseException, e:
            print 'failed on data, ', str(e)  # if there is an error, show what it is
            time.sleep(5)  # one error could be that you're rate-limited; this will cause the script to pause for 5 seconds

    def on_error(self, status):
        print status

# authenticate yourself
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["taco"])  # track what you want to search for!

The last line of the code is the part I'm struggling with; if I use:

twitterStream.filter(track=["taco","chicken","beef"])

it will return all tweets containing any of the three words. Other things I've tried, such as:

 twitterStream.filter(track=(["taco"&&("chicken","beef")])

return a syntax error.

I'm fairly new to both Python and Tweepy. Both this and this seem like similar queries, but they are related to tracking multiple terms simultaneously, rather than tracking a subset of tweets containing a term. I haven't been able to find anything in the tweepy documentation.

I know another option would be tracking all tweets containing "taco" then filtering by "chicken" or "beef" into my database, but I'm worried about running up against the 1% streaming rate limit if I do a general search and then filter it down within Python, so I'd prefer only streaming the terms I want in the first place from Twitter.

Thanks in advance-

Sam

Answer

Aaron Hill picture Aaron Hill · Mar 13, 2014

Twitter does not allow you to be very precise in how keywords are matched. However, the track parameter documentation states that spaces within a keyword are equivelent to logicals ANDS. All of the terms you specify are OR'd together.

So, to achieve your "taco" AND ("chicken" OR "beef") example, you could try the parameters [taco chicken, taco beef]. This would match tweets containing the words taco and chicken, or taco and beef. However, this isn't a perfect solution, as a tweet containing taco, chicken, and beef would also be matched.