I want to get some tweets regarding aggressive dogs. My keywords are specified in the code. All of them refer to German shepherd (In Spanish "pastor alemán"). For instance, among other tweets I expect to get this one that perfectly fits the keywords and was posted on 23 Feb 2015. I executed the below-given code and after around 1 hour of waiting the following error appeared:
requests.packages.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='stream.twitter.com', port=443): Read timed out.
It seems that there is some problem with the port 443. How to solve this problem?
P.S. The code works fine with keywords like "python, javascript".
UPDATE: I noticed that the code retrieves some tweets if I write keywords in English, like "German shepherd aggressive". But then I receive another error message:
socket.error: [Errno 10054] An existing connection was forcibly closed by the remote host
My code:
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import time
CONSUMER_KEY = "..."
CONSUMER_SECRET = "..."
ACCESS_TOKEN = "..."
ACCESS_TOKEN_SECRET = "..."
class listener(StreamListener):
def on_data(self, data):
try:
print data
saveFile = open('raw_tweets.json', 'a')
saveFile.write(data)
saveFile.write('\n')
saveFile.close()
return True
except BaseException, e:
print 'failed ondata,', str(e)
time.sleep(10)
pass
def on_error(self, status):
print status
if status == 420:
return False
if __name__ == '__main__':
auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
stream = Stream(auth, listener())
keywords = ['pastor aleman agresivo','pastor aleman muerde',
'pastor aleman mata','pastor aleman muerte',
'pastor aleman peligroso','pastor aleman peligro',
'pastor aleman adiestramiento']
stream.filter(track=keywords)
Catch these errors and restart the stream. The errors are normal. Connections may break for a number of reasons you have no control over. Also, Twitter will close the connection if there is no activity after 90 seconds.
EDIT: Someone posted an example using tweepy that does something similar to what you need.