I am writing a Python script using pycurl
to consume Twitter's Sreaming API. Here's a short snippet that does exactly that (simply put your Twitter login/password to test it):
import pycurl
user = 'USER'
password = 'PWD'
def handleData(data):
print(data)
conn = pycurl.Curl()
conn.setopt(pycurl.USERPWD, "%s:%s" % (user, password))
conn.setopt(pycurl.URL, 'https://stream.twitter.com/1/statuses/sample.json')
conn.setopt(pycurl.WRITEFUNCTION, handleData)
conn.perform()
The problem is that because the script consumes a stream, conn.perform()
never returns (or very rarely). Thus, I sometimes need to interrupt the script, and the KeyboardInterrupt
is caught by the perform()
method.
However, it does not handle it well, prints an ugly error, and raises a different exception.
^CTraceback (most recent call last):
File "test.py", line 6, in handleData
def handleData(data):
KeyboardInterrupt
Traceback (most recent call last):
File "test.py", line 12, in <module>
conn.perform()
pycurl.error: (23, 'Failed writing body (0 != 2203)')
The cURL FAQ says that to interrupt an ongoing transfer, one of the callback functions (in my case handleData
) should return a special value. This is great, but the KeyboardInterrupt
is not caught by any of the callback function!
How can I do this neatly?
EDIT: I know that you can catch exceptions, but pycurl still does some funny things:
If I do:
try:
conn.perform()
except BaseException as e:
print('We caught the exception')
print(type(e))
I get:
^CTraceback (most recent call last):
File "test.py", line 6, in handleData
def handleData(data):
KeyboardInterrupt
We caught the exception
<class 'pycurl.error'>
This means that internally, pycurl
does some kind of catching, prints an ugly error message, and then raises a pycurl.error
.
You need to catch CTRL+C and process that signal
Original: Example 1
Original: Example 2
Example 1
#!/usr/bin/env python
import signal
import sys
def signal_handler(signal, frame):
print 'You pressed Ctrl+C!'
sys.exit(0)
signal.signal(signal.SIGINT, signal_handler)
print 'Press Ctrl+C'
signal.pause()
Example 2
import signal, os
def handler(signum, frame):
print 'Signal handler called with signal', signum
raise IOError("Couldn't open device!")
# Set the signal handler and a 5-second alarm
signal.signal(signal.SIGALRM, handler)
signal.alarm(5)
# This open() may hang indefinitely
fd = os.open('/dev/ttyS0', os.O_RDWR)
signal.alarm(0) # Disable the alarm
And at least something is not working on that twitter link, see here
And it's helpfull to have debug mode enabled when testing.
import pycurl
username = 'your_user_name'
password = 'your_password'
def body(buf):
for item in buf.strip().split('\n'):
if item.strip():
print item
def test(debug_type, debug_msg):
if len(debug_msg) < 300:
print "debug(%d): %s" % (debug_type, debug_msg.strip())
conn = pycurl.Curl()
conn.setopt(pycurl.USERNAME, username)
conn.setopt(pycurl.PASSWORD, password)
#conn.setopt(pycurl.SSL_VERIFYPEER, False)
conn.setopt(pycurl.FOLLOWLOCATION, True)
conn.setopt(pycurl.VERBOSE, True)
conn.setopt(pycurl.URL, 'https://stream.twitter.com/1.1/statuses/sample.json')
conn.setopt(pycurl.DEBUGFUNCTION, test)
conn.setopt(pycurl.WRITEFUNCTION, body)
conn.perform()
conn.close()
Just copy/paste working test Example
➜ ~ hcat twitter.py
import pycurl
import signal
import sys
from time import sleep
username = 'bubudee'
password = 'deebubu'
def body(buf):
for item in buf.strip().split('\n'):
if item.strip():
print item
def test(debug_type, debug_msg):
if len(debug_msg) < 300:
print "debug(%d): %s" % (debug_type, debug_msg.strip())
def handle_ctrl_c(signal, frame):
print "Got ctrl+c, going down!"
sys.exit(0)
signal.signal(signal.SIGINT, handle_ctrl_c)
conn = pycurl.Curl()
conn.setopt(pycurl.USERNAME, username)
conn.setopt(pycurl.PASSWORD, password)
#conn.setopt(pycurl.SSL_VERIFYPEER, False)
conn.setopt(pycurl.FOLLOWLOCATION, True)
conn.setopt(pycurl.VERBOSE, True)
conn.setopt(pycurl.URL, 'https://stream.twitter.com/1.1/statuses/sample.json')
conn.setopt(pycurl.DEBUGFUNCTION, test)
conn.setopt(pycurl.WRITEFUNCTION, body)
conn.perform()
print "Who let the dogs out?:p"
sleep(10)
conn.close()
➜ ~ python twitter.py
debug(0): About to connect() to stream.twitter.com port 443 (#0)
debug(0): Trying 199.16.156.110...
debug(0): Connected to stream.twitter.com (199.16.156.110) port 443 (#0)
debug(0): Initializing NSS with certpath: sql:/etc/pki/nssdb
debug(0): CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
debug(0): SSL connection using SSL_RSA_WITH_RC4_128_SHA
debug(0): Server certificate:
debug(0): subject: CN=stream.twitter.com,OU=Twitter Security,O="Twitter, Inc.",L=San Francisco,ST=California,C=US
debug(0): start date: Oct 09 00:00:00 2013 GMT
debug(0): expire date: Dec 30 23:59:59 2016 GMT
debug(0): common name: stream.twitter.com
debug(0): issuer: CN=VeriSign Class 3 Secure Server CA - G3,OU=Terms of use at https://www.verisign.com/rpa (c)10,OU=VeriSign Trust Network,O="VeriSign, Inc.",C=US
debug(0): Server auth using Basic with user 'bubudee'
debug(2): GET /1.1/statuses/sample.json HTTP/1.1
Authorization: Basic YnVidWRlZTpkZWVidWJ1
User-Agent: PycURL/7.29.0
Host: stream.twitter.com
Accept: */*
debug(1): HTTP/1.1 401 Unauthorized
debug(0): Authentication problem. Ignoring this.
debug(1): WWW-Authenticate: Basic realm="Firehose"
debug(1): Content-Type: text/html
debug(1): Cache-Control: must-revalidate,no-cache,no-store
debug(1): Content-Length: 1243
debug(1): Connection: close
debug(1):
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<title>Error 401 Unauthorized</title>
</head>
<body>
<h2>HTTP ERROR: 401</h2>
<p>Problem accessing '/1.1/statuses/sample.json'. Reason:
<pre> Unauthorized</pre>
</body>
</html>
debug(0): Closing connection 0
Who let the dogs out?:p
^CGot ctrl+c, going down!