How to collect data from a website

Mr. MonoChrome picture Mr. MonoChrome · Dec 18, 2011 · Viewed 24.7k times · Source

Preface: I have a broad, college knowledge, of a handful of languages (C++, VB,C#,Java, many web languages), so go with which ever you like.

I want to make an android app that compares numbers, but in order to do that I need a database. I'm a one man team, and the numbers get updated biweekly so I want to grab those numbers off of a wiki that gets updated as well.

So my question is: how can I access information from a website using one of the languages above?

Answer

JP Beaudry picture JP Beaudry · Dec 18, 2011

What I understand the problem to be: Some entity generates a data set (i.e. numbers) every other week and you have a need to download that data set for treatment (e.g. sorting).

Ideally, the web site maintaining the wiki would provide a Service, like a RESTful interface, to easily gather the data. If that were the case, I'd go with any language that provides easy manipulation of HTTP request & response, and makes your data manipulation easy. As a previous poster said, Java would work well.

If you are stuck with the wiki page, you have a couple of options. You can parse the HTML your browser receives (Perl comes to mind as a decent language for that). Or you can use tools built for that purpose such as the aforementioned Jsoup.

Your question also mentions some implementation details such as needing a database. Evidently, there isn't enough contextual information for me to know whether that's optimal, so I won't address this aspect of the problem.