Scrape tweets by tweet location and user location

Sitz Blogz picture Sitz Blogz · Dec 3, 2015 · Viewed 9k times · Source

I am trying to use tweepy to download tweets using the tweet location and not by user location. Currently, I can download tweets with the user location but am not able to get the tweet location even ifgeo_enabled returns True.

For example, suppose user_a is from New York but he tweets from California. I want both the user location, New York, and the tweet location, California.

Code:

import tweepy
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import pandas as pd
import json
import csv
import sys
reload(sys)
sys.setdefaultencoding('utf8')

ckey = 'key'
csecret = 'secret'
atoken = 'token'
asecret = 'secret'
#csvfile = open('StreamSearch.csv','a')
#csvwriter = csv.writer(csvfile, delimiter = ',')

class StdOutListener(StreamListener):
    def __init__(self, api=None):
        super(StdOutListener, self).__init__()
        self.num_tweets = 0

    def on_data(self, data):
        self.num_tweets += 1
        if self.num_tweets < 5: #Remove the limit of no. of tweets to 5
            print data
            return True
        else:
            return False

    def on_error(self, status):
        print status


l = StdOutListener()
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
stream = Stream(auth, l)
stream.filter(locations = [80.10,12.90,80.33,13.24] ) #user location 

Output

userLocation, userTimezone, Coordinates,GeoEnabled, Language, TweetPlace
London,UK      Amsterdam                  FALSE      en         null
Aachen,Germany  Berlin                    TRUE       de         null
Kewaunee Wi                               TRUE       en         null
Connecticut, Eastern Time (US & Canada)   TRUE       en         null
                                          TRUE       en         null
Lahore, City of Gardens London            TRUE       en         null
NAU class of 2018.  Arizona               FALSE      en         null
                                          FALSE      en         null
    Pacific Time (US & Canada)            FALSE      en         null

The above given output is cleaned version of the massive data. Even though the Geolocation is enabled I am not able to get the tweet location and nor the co-ordinates.

Answer

ilyas patanam picture ilyas patanam · Dec 11, 2015
  1. Why do tweets with geo_enabled == True not give the tweet location?

According to this, if place or coordinates is None, it means the user didn't allow permission for that tweet. Users with geo_enabled turned on still have to give explicit permission for their exact location to be displayed. Also, the documentation states:

geo_enabled: When true, indicates that the user has enabled the possibility of geotagging their Tweets. This field must be true for the current user to attach geographic data when using POST statuses/update.

  1. How to filter by tweet location? Check here

If you filtered by location, only Tweets falling within the requested bounding boxes will be included, the user’s location field is not used to filter tweets. If the coordinates and place are empty, then the tweet will not pass the filter.

#filter all tweets from san francisco
myStream.filter(location= [-122.75,36.8,-121.75,37.8])
  1. How to filter by user location and tweet location?

You can capture the tweets from the filter and then check the authors' location to match your area of interest.

class StdOutListener(StreamListener):
    def __init__(self, api=None):
        super(StdOutListener, self).__init__()
        self.num_tweets = 0

    def on_data(self, data):
    #first check the location is not None
        if status.author.location and 'New York' in status.author.location:
            self.num_tweets += 1
            print data
        if self.num_tweets < 5: #Remove the limit of no. of tweets to 5            
            return True
        else:
            return False
    def on_error(self, status):
        print status
  1. How to not restrict ourselves to the Twitter API filters?

Remember the filter allows all tweets as long as it passes one of the parameters, so if you need to be more restrictive just include conditional clauses in def on_data(self, data) as I did in (3) for the author location.