urllib.urlretrieve with custom header

realUser404 picture realUser404 · Jul 22, 2017 · Viewed 13.9k times · Source

I am trying to retrieve a file using urlretrieve, while adding a custom header.

While checking the codesource of urllib.request I realized urlopen can take a Request object in parameter instead of just a string, allowing to put the header I want. But if I try to do the same with urlretrieve, I get a TypeError: expected string or bytes-like object as mentionned in this other post.

What I ended up doing is rewriting my own urlretrieve, removing the line throwing the error (that line is irrelevant in my use case).

It works fine but I am wondering if there is a better/cleaner way of doing it, rather than rewriting my own urlretrieve. If it is possible to pass a custom header to urlopen, it feels like it should be possible to do the same with urlretrieve?

Answer

Lost Crotchet picture Lost Crotchet · Oct 1, 2017

I found a way where you only have to add a few extra lines of code...

import urllib.request

opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)
urllib.request.urlretrieve("type URL here", "path/file_name")

Should you wish to learn about the details you can refer to the python documentation: https://docs.python.org/3/library/urllib.request.html