Python POST binary data

Mac picture Mac · Jan 16, 2013 · Viewed 113.1k times · Source

I am writing some code to interface with redmine and I need to upload some files as part of the process, but I am not sure how to do a POST request from python containing a binary file.

I am trying to mimic the commands here:

curl --data-binary "@image.png" -H "Content-Type: application/octet-stream" -X POST -u login:password http://redmine/uploads.xml

In python (below), but it does not seem to work. I am not sure if the problem is somehow related to encoding the file or if something is wrong with the headers.

import urllib2, os

FilePath = "C:\somefolder\somefile.7z"
FileData = open(FilePath, "rb")
length = os.path.getsize(FilePath)

password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_manager.add_password(None, 'http://redmine/', 'admin', 'admin')
auth_handler = urllib2.HTTPBasicAuthHandler(password_manager)
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
request = urllib2.Request( r'http://redmine/uploads.xml', FileData)
request.add_header('Content-Length', '%d' % length)
request.add_header('Content-Type', 'application/octet-stream')
try:
    response = urllib2.urlopen( request)
    print response.read()
except urllib2.HTTPError as e:
    error_message = e.read()
    print error_message

I have access to the server and it looks like a encoding error:

...
invalid byte sequence in UTF-8
Line: 1
Position: 624
Last 80 unconsumed characters:
7z¼¯'ÅÐз2^Ôøë4g¸R<süðí6kĤª¶!»=}jcdjSPúá-º#»ÄAtD»H7Ê!æ½]j):

(further down)

Started POST "/uploads.xml" for 192.168.0.117 at 2013-01-16 09:57:49 -0800
Processing by AttachmentsController#upload as XML
WARNING: Can't verify CSRF token authenticity
  Current user: anonymous
Filter chain halted as :authorize_global rendered or redirected
Completed 401 Unauthorized in 13ms (ActiveRecord: 3.1ms)

Answer

Piotr Dobrogost picture Piotr Dobrogost · Jan 22, 2013

Basically what you do is correct. Looking at redmine docs you linked to, it seems that suffix after the dot in the url denotes type of posted data (.json for JSON, .xml for XML), which agrees with the response you get - Processing by AttachmentsController#upload as XML. I guess maybe there's a bug in docs and to post binary data you should try using http://redmine/uploads url instead of http://redmine/uploads.xml.

Btw, I highly recommend very good and very popular Requests library for http in Python. It's much better than what's in the standard lib (urllib2). It supports authentication as well but I skipped it for brevity here.

import requests
with open('./x.png', 'rb') as f:
    data = f.read()
res = requests.post(url='http://httpbin.org/post',
                    data=data,
                    headers={'Content-Type': 'application/octet-stream'})

# let's check if what we sent is what we intended to send...
import json
import base64

assert base64.b64decode(res.json()['data'][len('data:application/octet-stream;base64,'):]) == data

UPDATE

To find out why this works with Requests but not with urllib2 we have to examine the difference in what's being sent. To see this I'm sending traffic to http proxy (Fiddler) running on port 8888:

Using Requests

import requests

data = 'test data'
res = requests.post(url='http://localhost:8888',
                    data=data,
                    headers={'Content-Type': 'application/octet-stream'})

we see

POST http://localhost:8888/ HTTP/1.1
Host: localhost:8888
Content-Length: 9
Content-Type: application/octet-stream
Accept-Encoding: gzip, deflate, compress
Accept: */*
User-Agent: python-requests/1.0.4 CPython/2.7.3 Windows/Vista

test data

and using urllib2

import urllib2

data = 'test data'    
req = urllib2.Request('http://localhost:8888', data)
req.add_header('Content-Length', '%d' % len(data))
req.add_header('Content-Type', 'application/octet-stream')
res = urllib2.urlopen(req)

we get

POST http://localhost:8888/ HTTP/1.1
Accept-Encoding: identity
Content-Length: 9
Host: localhost:8888
Content-Type: application/octet-stream
Connection: close
User-Agent: Python-urllib/2.7

test data

I don't see any differences which would warrant different behavior you observe. Having said that it's not uncommon for http servers to inspect User-Agent header and vary behavior based on its value. Try to change headers sent by Requests one by one making them the same as those being sent by urllib2 and see when it stops working.