The system I'm running on is Windows XP, with JRE 1.6.
I do this :
public static void main(String[] args) {
try {
System.out.println(new File("C:\\test a.xml").toURI().toURL());
} catch (Exception e) {
e.printStackTrace();
}
}
and I get this : file:/C:/test%20a.xml
How come the given URL doesn't have two slashes before the C:
? I expected file://C:...
. Is it normal behaviour?
EDIT :
From Java source code : java.net.URLStreamHandler.toExternalForm(URL)
result.append(":");
if (u.getAuthority() != null && u.getAuthority().length() > 0) {
result.append("//");
result.append(u.getAuthority());
}
It seems that the Authority part of a file URL is null or empty, and thus the double slash is skipped. So what is the authority part of a URL and is it really absent from the file protocol?
That's an interesting question.
First things first: I get the same results on JRE6. I even get that when I lop off the toURL() part.
RFC2396 does not actually require two slashes. According to section 3:
The URI syntax is dependent upon the scheme. In general, absolute URI are written as follows:
<scheme>:<scheme-specific-part>
Having said that, RFC2396 has been superseded by RFC3986, which states
The generic URI syntax consists of a hierarchical sequence of omponents referred to as the scheme, authority, path, query, and fragment.
URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] hier-part = "//" authority path-abempty / path-absolute / path-rootless / path-empty
The scheme and path components are required, though the path may be empty (no characters). When authority is present, the path must either be empty or begin with a slash ("/") character. When authority is not present, the path cannot begin with two slash characters ("//"). These restrictions result in five different ABNF rules for a path (Section 3.3), only one of which will match any given URI reference.
So, there you go. Since file URIs have no authority segment, they're forbidden from starting with //.
However, that RFC didn't come around until 2005, and Java references RFC2396, so I don't know why it's following this convention, as file URLs before the new RFC have always had two slashes.