I've been learning Python for a couple of months through online courses and would like to further my learning through a real world mini project.
For this project, I would like to collect tweets from the twitter streaming API and store them in json format (though you can choose to just save the key information like status.text, status.id, I've been advised that the best way to do this is to save all the data and do the processing after). However, with the addition of the on_data() the code ceases to work. Would someone be able to to assist please? I'm also open to suggestions on the best way to store/process tweets! My end goal is to be able to track tweets based on demographic variables (e.g., country, user profile age, etc) and the sentiment of particular brands (e.g., Apple, HTC, Samsung).
In addition, I would also like to try filtering tweets by location AND keywords. I've adapted the code from How to add a location filter to tweepy module separately. However, while it works when there are a few keywords, it stops when the number of keywords grows. I presume my code is inefficient. Is there a better way of doing it?
### code to save tweets in json###
import sys
import tweepy
import json
consumer_key=" "
consumer_secret=" "
access_key = " "
access_secret = " "
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
file = open('today.txt', 'a')
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
print status.text
def on_data(self, data):
json_data = json.loads(data)
file.write(str(json_data))
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(track=['twitter'])
I found a way to save the tweets to a json file. Happy to hear how it can be improved!
# initialize blank list to contain tweets
tweets = []
# file name that you want to open is the second argument
save_file = open('9may.json', 'a')
class CustomStreamListener(tweepy.StreamListener):
def __init__(self, api):
self.api = api
super(tweepy.StreamListener, self).__init__()
self.save_file = tweets
def on_data(self, tweet):
self.save_file.append(json.loads(tweet))
print tweet
save_file.write(str(tweet))