Python: How to check if a string is a valid IRI?

Eduard Florinescu picture Eduard Florinescu · Sep 24, 2012 · Viewed 9.8k times · Source

Is there a standard function to check an IRI, to check an URL apparently I can use:

parts = urlparse.urlsplit(url)  
    if not parts.scheme or not parts.netloc:  
        '''apparently not an url'''

I tried the above with an URL containing Unicode characters:

import urlparse
url = "http://fdasdf.fdsfîășîs.fss/ăîăî"
parts = urlparse.urlsplit(url)
if not parts.scheme or not parts.netloc:  
    print "not an url"
else:
    print "yes an url"

and what I get is yes an url. Does this means I'm good an this tests for valid IRI? Is there another way ?

Answer

Martijn Pieters picture Martijn Pieters · Sep 24, 2012

Using urlparse is not sufficient to test for a valid IRI.

Use the rfc3987 package instead:

from rfc3987 import parse

parse('http://fdasdf.fdsfîășîs.fss/ăîăî', rule='IRI')