Using: Delphi 2010, latest version of Indy
I am trying to scrape the data off Googles Adsense web page, with an aim to get the reports. However I have been unsuccessful so far. It stops after the first request and does not proceed.
Using Fiddler to debug the traffic/requests to Google Adsense website, and a web browser to load the Adsense page, I can see that the request (from the webbrowser) generates a number of redirects until the page is loaded.
However, my Delphi application is only generating a couple of requests before it stops.
Here are the steps I have followed:
Finally I have this code:
procedure TfmMain.GetUrlToFile(AURL, AFile : String);
var
Output : TMemoryStream;
begin
Output := TMemoryStream.Create;
try
IdHTTP1.Get(FURL, Output);
Output.SaveToFile(AFile);
finally
Output.Free;
end;
end;
However, it does not get to the login page as expected. I would expect it to behave as if it was a webbrowser and proceed through the redirects until it finds the final page.
This is the output of the headers from Fiddler:
HTTP/1.1 302 Found Location: https://encrypted.google.com/ Cache-Control: private Content-Type: text/html; charset=UTF-8 Set-Cookie: PREF=ID=5166063f01b64b03:FF=0:TM=1293571783:LM=1293571783:S=a5OtsOqxu_GiV3d6; expires=Thu, 27-Dec-2012 21:29:43 GMT; path=/; domain=.google.com Set-Cookie: NID=42=XFUwZdkyF0TJKmoJjqoGgYNtGyOz-Irvz7ivao2z0--pCBKPpAvCGUeaa5GXLneP41wlpse-yU5UuC57pBfMkv434t7XB1H68ET0ZgVDNEPNmIVEQRVj7AA1Lnvv2Aez; expires=Wed, 29-Jun-2011 21:29:43 GMT; path=/; domain=.google.com; HttpOnly Date: Tue, 28 Dec 2010 21:29:43 GMT Server: gws Content-Length: 226 X-XSS-Protection: 1; mode=block
Firstly, is there anything wrong with this output?
Is there something more that I should do to get the IdHTTP component to keep pursuing the redirects until the final page?
IdHTTP component property values prior to making the call:
Name := 'IdHTTP1';
IOHandler := IdSSLIOHandlerSocketOpenSSL1;
AllowCookies := True;
HandleRedirects := True;
RedirectMaximum := 35;
Request.UserAgent :=
'Mozilla/5.0 (Windows NT 5.1; rv:2.0b8) Gecko/20100101 Firefox/4.' +
'0b8';
HTTPOptions := [hoForceEncodeParams];
OnRedirect := IdHTTP1Redirect;
CookieManager := IdCookieManager1;
Redirect event handler:
procedure TfmMain.IdHTTP1Redirect(Sender: TObject; var dest: string; var
NumRedirect: Integer; var Handled: Boolean; var VMethod: string);
begin
Handled := True;
end;
Making the call:
FURL := 'https://www.google.com';
GetUrlToFile( (FURL + '/adsense/'), 'a.html');
procedure TfmMain.GetUrlToFile(AURL, AFile : String);
var
Output : TMemoryStream;
begin
Output := TMemoryStream.Create;
try
try
IdHTTP1.Get(AURL, Output);
IdHTTP1.Disconnect;
except
end;
Output.SaveToFile(AFile);
finally
Output.Free;
end;
end;
Here's the (request and response headers) output from Fiddler: