Sites not accepting wget user agent header

Joe Mornin picture Joe Mornin · Jun 19, 2013 · Viewed 104.6k times · Source

When I run this command:

wget --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0"  http://yahoo.com

...I get this result (with nothing else in the file):

<!-- hw147.fp.gq1.yahoo.com uncompressed/chunked Wed Jun 19 03:42:44 UTC 2013 -->

But when I run wget http://yahoo.com with no --user-agent option, I get the full page.

The user agent is the same header that my current browser sends. Why does this happen? Is there a way to make sure the user agent doesn't get blocked when using wget?

Answer

filip26 picture filip26 · Jun 30, 2013

It seems Yahoo server does some heuristic based on User-Agent in a case Accept header is set to */*.

Accept: text/html

did the trick for me.

e.g.

wget  --header="Accept: text/html" --user-agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:21.0) Gecko/20100101 Firefox/21.0"  http://yahoo.com

Note: if you don't declare Accept header then wget automatically adds Accept:*/* which means give me anything you have.