I can't understand why Java's HttpURLConnection
does not follow an HTTP redirect from an HTTP to an HTTPS URL. I use the following code to get the page at https://httpstat.us/:
import java.net.URL;
import java.net.HttpURLConnection;
import java.io.InputStream;
public class Tester {
public static void main(String argv[]) throws Exception{
InputStream is = null;
try {
String httpUrl = "http://httpstat.us/301";
URL resourceUrl = new URL(httpUrl);
HttpURLConnection conn = (HttpURLConnection)resourceUrl.openConnection();
conn.setConnectTimeout(15000);
conn.setReadTimeout(15000);
conn.connect();
is = conn.getInputStream();
System.out.println("Original URL: "+httpUrl);
System.out.println("Connected to: "+conn.getURL());
System.out.println("HTTP response code received: "+conn.getResponseCode());
System.out.println("HTTP response message received: "+conn.getResponseMessage());
} finally {
if (is != null) is.close();
}
}
}
The output of this program is:
Original URL: http://httpstat.us/301 Connected to: http://httpstat.us/301 HTTP response code received: 301 HTTP response message received: Moved Permanently
A request to http://httpstat.us/301 returns the following (shortened) response (which seems absolutely right!):
HTTP/1.1 301 Moved Permanently
Cache-Control: private
Content-Length: 21
Content-Type: text/plain; charset=utf-8
Location: https://httpstat.us
Unfortunately, Java's HttpURLConnection
does not follow the redirect!
Note that if you change the original URL to HTTPS (https://httpstat.us/301), Java will follow the redirect as expected!?
Redirects are followed only if they use the same protocol. (See the followRedirect()
method in the source.) There is no way to disable this check.
Even though we know it mirrors HTTP, from the HTTP protocol point of view, HTTPS is just some other, completely different, unknown protocol. It would be unsafe to follow the redirect without user approval.
For example, suppose the application is set up to perform client authentication automatically. The user expects to be surfing anonymously because he's using HTTP. But if his client follows HTTPS without asking, his identity is revealed to the server.