Using cURL to download a site's HTML source, but getting different file than intended

user954912 picture user954912 · Nov 11, 2011 · Viewed 12.6k times · Source

I'm trying to use cURL and PHP to download the HTML source (as it appears in the browser) of here. But instead of the actual source code, this is returned instead (a meta refresh link set to 0).

<html>
    <head><title>Object moved</title></head>
    <body>
        <h2>Object moved to <a href="https://login.live.com/login.srf?wa=wsignin1.0&amp;rpsnv=11&amp;checkda=1&amp;ct=1321044850&amp;rver=6.1.6195.0&amp;wp=MBI&amp;wreply=http:%2F%2Fwww.windowsphone.com%2Fen-US%2Fapps%2Fea39f002-ac30-e011-854c-00237de2db9e&amp;lc=1033&amp;id=268289">here</a>.
        </h2>
    </body>
</html>

I'm trying to spoof the referral header to be the site, but it seems I'm doing it wrong. Code is below. Any suggestions? Thanks

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, 'http://www.windowsphone.com/en-US/apps/ea39f002-ac30-e011-854c-00237de2db9e');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.6 (KHTML, like Gecko) Chrome/16.0.897.0 Safari/535.6'); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_1);
curl_setopt($ch, CURLOPT_AUTOREFERER, false);
curl_setopt($ch, CURLOPT_REFERER, "http://www.windowsphone.com/en-US/apps/ea39f002-ac30-e011-854c-00237de2db9e");

$html = curl_exec($ch);
curl_close($ch);

Answer

jli picture jli · Nov 11, 2011

Add the curl option to follow redirects:

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

If it is a meta refresh and not an HTTP moved header, see: PHP: Can CURL follow meta redirects

As mentioned by flesk, you may also need to store the cookies.