I want to check whether a URL is valid, before I open it to read data.
I was using the function urlparse
from the urlparse
package:
if not bool(urlparse.urlparse(url).netloc):
# do something like: open and read using urllin2
However, I noticed that some valid URLs are treated as broken, for example:
url = upload.wikimedia.org/math/8/8/d/88d27d47cea8c88adf93b1881eda318d.png
This URL is valid (I can open it using my browser).
Is there a better way to check if the URL is valid?
You can check if the url has the scheme:
>>> url = "no.scheme.com/math/12345.png"
>>> parsed_url = urlparse.urlparse(url)
>>> bool(parsed_url.scheme)
False
If it's the case, you can replace the scheme and get a real valid url:
>>> parsed_url.geturl()
"no.scheme.com/math/12345.png"
>>> parsed_url = parsed_url._replace(**{"scheme": "http"})
>>> parsed_url.geturl()
'http:///no.scheme.com/math/12345.png'