I use library rome.dev.java.net to fetch RSS.
Code is
URL feedUrl = new URL("http://planet.rubyonrails.ru/xml/rss");
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(feedUrl));
You can check that http://planet.rubyonrails.ru/xml/rss is valid URL and the page is shown in browser.
But I get exception from my application
java.io.FileNotFoundException: http://planet.rubyonrails.ru/xml/rss
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1311)
at com.sun.syndication.io.XmlReader.<init>(XmlReader.java:237)
at com.sun.syndication.io.XmlReader.<init>(XmlReader.java:213)
at rssdaemonapp.ValidatorThread.run(ValidatorThread.java:32)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
I don't use any proxy. I get this exception on my PC and on the production server and only for this URL, other URLs are working.
The code that is throwing that exception looks like this ... assuming I've got the right version:
if (respCode >= 400) {
if (respCode == 404 || respCode == 410) {
throw new FileNotFoundException(url.toString());
} else {
throw new java.io.IOException(
"Server returned HTTP"
+ " response code: " + respCode
+ " for URL: " + url.toString());
}
}
In other words, when you are doing the GET from Java, you are getting a 404 or 410 response. Now when I do the request using the wget
utility, I get a 200 response. So my guess is that the problem is one of the following:
Other possibilities are that they are doing some kind of server-side filtering on IP addresses or that there is some DNS problem that is causing your requests to go to a different IP address. But both of these seem to be contradicted by the fact that you can access the feed in your browser.
If this is the User-Agent, take a look at their terms of service to see if they have a banned certain kinds of use of their site / RSS feed.