I'm trying to parse a webpage using Java with URLConnection. I try to set up the user-agent like this:
java.net.URLConnection c = url.openConnection();
c.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2");
But the resulting user agent is the one I specify, with "Java/1.5.0_19" appended to the end. Is there a way to truly set the user agent without this addition?
Just for clarification: setRequestProperty("User-Agent", "Mozilla ...")
now works just fine and doesn't append java/xx
at the end! At least with Java 1.6.30 and newer.
I listened on my machine with netcat(a port listener):
$ nc -l -p 8080
It simply listens on the port, so you see anything which gets requested, like raw http-headers.
And got the following http-headers without setRequestProperty:
GET /foobar HTTP/1.1
User-Agent: Java/1.6.0_30
Host: localhost:8080
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
And WITH setRequestProperty:
GET /foobar HTTP/1.1
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2
Host: localhost:8080
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
As you can see the user agent was properly set.
Full example:
import java.io.IOException;
import java.net.URL;
import java.net.URLConnection;
public class TestUrlOpener {
public static void main(String[] args) throws IOException {
URL url = new URL("http://localhost:8080/foobar");
URLConnection hc = url.openConnection();
hc.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2");
System.out.println(hc.getContentType());
}
}