Java : File.toURI().toURL() on Windows file

glmxndr picture glmxndr · Jul 15, 2009 · Viewed 26.5k times · Source

The system I'm running on is Windows XP, with JRE 1.6.

I do this :

public static void main(String[] args) {
    try {
        System.out.println(new File("C:\\test a.xml").toURI().toURL());
    } catch (Exception e) {
        e.printStackTrace();
    }       
}

and I get this : file:/C:/test%20a.xml

How come the given URL doesn't have two slashes before the C: ? I expected file://C:.... Is it normal behaviour?


EDIT :

From Java source code : java.net.URLStreamHandler.toExternalForm(URL)

    result.append(":");
    if (u.getAuthority() != null && u.getAuthority().length() > 0) {
        result.append("//");
        result.append(u.getAuthority());
    }

It seems that the Authority part of a file URL is null or empty, and thus the double slash is skipped. So what is the authority part of a URL and is it really absent from the file protocol?

Answer

Powerlord picture Powerlord · Jul 15, 2009

That's an interesting question.

First things first: I get the same results on JRE6. I even get that when I lop off the toURL() part.

RFC2396 does not actually require two slashes. According to section 3:

The URI syntax is dependent upon the scheme. In general, absolute URI are written as follows:

<scheme>:<scheme-specific-part>

Having said that, RFC2396 has been superseded by RFC3986, which states

The generic URI syntax consists of a hierarchical sequence of omponents referred to as the scheme, authority, path, query, and fragment.

  URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

  hier-part   = "//" authority path-abempty
              / path-absolute
              / path-rootless
              / path-empty

The scheme and path components are required, though the path may be empty (no characters). When authority is present, the path must either be empty or begin with a slash ("/") character. When authority is not present, the path cannot begin with two slash characters ("//"). These restrictions result in five different ABNF rules for a path (Section 3.3), only one of which will match any given URI reference.

So, there you go. Since file URIs have no authority segment, they're forbidden from starting with //.

However, that RFC didn't come around until 2005, and Java references RFC2396, so I don't know why it's following this convention, as file URLs before the new RFC have always had two slashes.