I am trying to use tweepy to download tweets using the tweet location and not by user location. Currently, I can download tweets with the user location but am not able to get the tweet location even ifgeo_enabled
returns True.
For example, suppose user_a
is from New York but he tweets from California. I want both the user location, New York, and the tweet location, California.
Code:
import tweepy
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import pandas as pd
import json
import csv
import sys
reload(sys)
sys.setdefaultencoding('utf8')
ckey = 'key'
csecret = 'secret'
atoken = 'token'
asecret = 'secret'
#csvfile = open('StreamSearch.csv','a')
#csvwriter = csv.writer(csvfile, delimiter = ',')
class StdOutListener(StreamListener):
def __init__(self, api=None):
super(StdOutListener, self).__init__()
self.num_tweets = 0
def on_data(self, data):
self.num_tweets += 1
if self.num_tweets < 5: #Remove the limit of no. of tweets to 5
print data
return True
else:
return False
def on_error(self, status):
print status
l = StdOutListener()
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
stream = Stream(auth, l)
stream.filter(locations = [80.10,12.90,80.33,13.24] ) #user location
Output
userLocation, userTimezone, Coordinates,GeoEnabled, Language, TweetPlace
London,UK Amsterdam FALSE en null
Aachen,Germany Berlin TRUE de null
Kewaunee Wi TRUE en null
Connecticut, Eastern Time (US & Canada) TRUE en null
TRUE en null
Lahore, City of Gardens London TRUE en null
NAU class of 2018. Arizona FALSE en null
FALSE en null
Pacific Time (US & Canada) FALSE en null
The above given output is cleaned version of the massive data. Even though the Geolocation
is enabled I am not able to get the tweet location and nor the co-ordinates
.
geo_enabled == True
not give the tweet location?According to this, if place or coordinates is None, it means the user didn't allow permission for that tweet. Users with geo_enabled turned on still have to give explicit permission for their exact location to be displayed. Also, the documentation states:
geo_enabled: When true, indicates that the user has enabled the possibility of geotagging their Tweets. This field must be true for the current user to attach geographic data when using POST statuses/update.
If you filtered by location, only Tweets falling within the requested bounding boxes will be included, the user’s location field is not used to filter tweets. If the coordinates and place are empty, then the tweet will not pass the filter.
#filter all tweets from san francisco
myStream.filter(location= [-122.75,36.8,-121.75,37.8])
You can capture the tweets from the filter and then check the authors' location to match your area of interest.
class StdOutListener(StreamListener):
def __init__(self, api=None):
super(StdOutListener, self).__init__()
self.num_tweets = 0
def on_data(self, data):
#first check the location is not None
if status.author.location and 'New York' in status.author.location:
self.num_tweets += 1
print data
if self.num_tweets < 5: #Remove the limit of no. of tweets to 5
return True
else:
return False
def on_error(self, status):
print status
Remember the filter allows all tweets as long as it passes one of the parameters, so if you need to be more restrictive just include conditional clauses in def on_data(self, data)
as I did in (3) for the author location.