I am using Tweepy API for extracting Twitter feeds. I want to extract all Twitter feeds of a specific language only. The language filter works only if track
filter is provided. The following code returns 406 error:
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
stream.filter(languages=["en"])
How can I extract all the tweets from certain language using Tweepy?
You can't (without special access). Streaming all the tweets (unfiltered) requires a connection to the firehose, which is granted only in specific use cases by Twitter. Honestly, the firehose isn't really necessary--proper use of track
can get you more tweets than you know what to do with.
Try using something like this:
stream.filter(languages=["en"], track=["a", "the", "i", "you", "u"]) # etc
Filtering by words like that will get you many, many tweets. If you want real data for the most-used words, check out this article from Time: The 500 Most Frequently Used Words on Twitter. You can use up to 400 keywords, but that will likely approach the 1% limit of tweets at a given time interval. If your track
parameter matches 60% of all tweets at a given time, you will still only get 1% (which is a LOT of tweets).