I need to extract the exchange rate of USD to another currency (say, EUR) for a long list of historical dates.
The www.xe.com
website gives the historical lookup tool, and using a detailed URL, one can get the rate table for a specific date, w/o populating the Date:
and From:
boxes. For example, the URL http://www.xe.com/currencytables/?from=USD&date=2012-10-15 gives the table of conversion rates from USD to other currencies on the day of Oct. 15th, 2012.
Now, assume I have a list of dates, I can loop through the list and change the date part of that URL to get the required page. If I can extract the rates list, then simple grep EUR
will give me the relevant exchange rate (I can use awk to specifically extract the rate).
The question is, how can I get the page(s) using Linux command line command? I tried wget
but it did not do the job.
If not CLI, is there an easy and straight forward way to programmatically do that (i.e., will require less time than do copy-paste of the dates to the browser's address bar)?
UPDATE 1:
When running:
$ wget 'http://www.xe.com/currencytables/?from=USD&date=2012-10-15'
I get a file which contain:
<HTML>
<HEAD><TITLE>Autoextraction Prohibited</TITLE></HEAD>
<BODY>
Automated extraction of our content is prohibited. See <A HREF="http://www.xe.com/errors/noautoextract.htm">http://www.xe.com/errors/noautoextract.htm</A>.
</BODY>
</HTML>
so it seems like the server can identify the type of query and blocks the wget
. Any way around this?
UPDATE 2:
After reading the response from the wget
command and the comments/answers, I checked the ToS of the website and found this clause:
You agree that you shall not:
...
f. use any automatic or manual process to collect, harvest, gather, or extract
information about other visitors to or users of the Services, or otherwise
systematically extract data or data fields, including without limitation any
financial and/or currency data or e-mail addresses;
which, I guess, concludes the efforts in this front.
Now, for my curiosity, if wget
generates an HTTP request, how does the server know that it was a command and not a browser request?
You need to use -O
to write the STDOUT
wget -O- http://www.xe.com/currencytables/?from=USD&date=2012-10-15
But it looks like xe.com does not want you to do automated downloads. I would suggest not doing automated downloads at xe.com