How do I save streaming tweets in json via tweepy?

Question 1

How do I save streaming tweets in json via tweepy?

python json tweepy

Eugene Yan · May 8, 2014 · Viewed 19.7k times · Source

Answer

Answer

I found a way to save the tweets to a json file. Happy to hear how it can be improved!

# initialize blank list to contain tweets
tweets = []
# file name that you want to open is the second argument
save_file = open('9may.json', 'a')

class CustomStreamListener(tweepy.StreamListener):
    def __init__(self, api):
        self.api = api
        super(tweepy.StreamListener, self).__init__()

        self.save_file = tweets

    def on_data(self, tweet):
        self.save_file.append(json.loads(tweet))
        print tweet
        save_file.write(str(tweet))

Question 2

I've been learning Python for a couple of months through online courses and would like to further my learning through a real world mini project.

For this project, I would like to collect tweets from the twitter streaming API and store them in json format (though you can choose to just save the key information like status.text, status.id, I've been advised that the best way to do this is to save all the data and do the processing after). However, with the addition of the on_data() the code ceases to work. Would someone be able to to assist please? I'm also open to suggestions on the best way to store/process tweets! My end goal is to be able to track tweets based on demographic variables (e.g., country, user profile age, etc) and the sentiment of particular brands (e.g., Apple, HTC, Samsung).

In addition, I would also like to try filtering tweets by location AND keywords. I've adapted the code from How to add a location filter to tweepy module separately. However, while it works when there are a few keywords, it stops when the number of keywords grows. I presume my code is inefficient. Is there a better way of doing it?

### code to save tweets in json###
import sys
import tweepy
import json

consumer_key=" "
consumer_secret=" "
access_key = " "
access_secret = " "

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
file = open('today.txt', 'a')

class CustomStreamListener(tweepy.StreamListener):
    def on_status(self, status):
        print status.text

    def on_data(self, data):
        json_data = json.loads(data)
        file.write(str(json_data))

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream

sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(track=['twitter'])

How do I save streaming tweets in json via tweepy?

Answer

Related questions