I'm trying to download this file (http://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar) with the following method and it doesn't seem to work. I'm getting an empty/corrupt file.
String link = "http://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar";
String fileName = "ChampionHelper-4.jar";
URL url = new URL(link);
URLConnection c = url.openConnection();
c.setRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 1.2.30703)");
InputStream input;
input = c.getInputStream();
byte[] buffer = new byte[4096];
int n = -1;
OutputStream output = new FileOutputStream(new File(fileName));
while ((n = input.read(buffer)) != -1) {
if (n > 0) {
output.write(buffer, 0, n);
}
}
output.close();
But I can successfully download the following file from my dropbox (http://dl.dropbox.com/u/13226123/ChampionHelper-4.jar) with the same method.
So somehow Github knows that I'm not a regular user trying to download a file. I already tried to change the user agent, but that didn't help either.
So how should I download a file that is hosted on my Github account using Java?
EDIT: I tried to use the apache commons-io for this but I get the same effect, an empty/corrupt file.
It looks like GitHub is giving you several levels of redirects when you request this file and this StackOverflow article states that URLConnection will not automatically follow redirects that change the protocol. Here is what I am seeing with curl:
First Request:
curl -v http://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar
* About to connect() to github.com port 80 (#0)
* Trying 207.97.227.239... connected
* Connected to github.com (207.97.227.239) port 80 (#0)
> GET /downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
> Host: github.com
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Server: nginx < Date: Sun, 18 Nov 2012 15:56:36 GMT
< Content-Type: text/html < Content-Length: 178
< Connection: close
< Location: https://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar
< <html> <head><title>301 Moved Permanently</title></head> <body bgcolor="white"> <center><h1>301 Moved Permanently</h1></center> <hr><center>nginx</center> </body> </html>
* Closing connection #0
A curl of this location header:
curl -v https://github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar
* About to connect() to github.com port 443 (#0)
* Trying 207.97.227.239... connected
* Connected to github.com (207.97.227.239) port 443 (#0)
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSL connection using RC4-SHA
* Server certificate:
* subject: businessCategory=Private Organization; 1.3.6.1.4.1.311.60.2.1.3=US; 1.3.6.1.4.1.311.60.2.1.2=California; serialNumber=C3268102; C=US; ST=California; L=San Francisco; O=GitHub, Inc.; CN=github.com
* start date: 2011-05-27 00:00:00 GMT
* expire date: 2013-07-29 12:00:00 GMT
* subjectAltName: github.com matched
* issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=DigiCert High Assurance EV CA-1
* SSL certificate verify ok.
> GET /downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
> Host: github.com
> Accept: */*
>
< HTTP/1.1 302 Found
< Server: nginx
< Date: Sun, 18 Nov 2012 15:58:56 GMT
< Content-Type: text/html; charset=utf-8
< Connection: keep-alive
< Status: 302 Found
< Strict-Transport-Security: max-age=2592000
< Cache-Control: no-cache
< X-Runtime: 48
< Location: http://cloud.github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar
< X-Frame-Options: deny
< Content-Length: 149
<
* Connection #0 to host github.com left intact
* Closing connection #0
* SSLv3, TLS alert, Client hello (1):
<html><body>You are being <a href="http://cloud.github.com/downloads/TheHolyWaffle/ChampionHelper/ChampionHelper-4.jar">redirected</a>.</body></html>
The location header in this response is returning the actual file. You may want to use Apache HTTP Client to download this. You can set it up to follow these 301 and 302 redirects during the GET.