OK, here's the header(just an example) info I got from Live HTTP Header while logging into an account:
http://example.com/login.html
POST /login.html HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 GTB7.1 (.NET CLR 3.5.30729)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: http://example.com
Cookie: blahblahblah; blah = blahblah
Content-Type: application/x-www-form-urlencoded
Content-Length: 39
username=shane&password=123456&do=login
HTTP/1.1 200 OK
Date: Sat, 18 Dec 2010 15:41:02 GMT
Server: Apache/2.2.3 (CentOS)
X-Powered-By: PHP/5.2.14
Set-Cookie: blah = blahblah_blah; expires=Sun, 18-Dec-2011 15:41:02 GMT; path=/; domain=.example.com; HttpOnly
Set-Cookie: blah = blahblah; expires=Sun, 18-Dec-2011 15:41:02 GMT; path=/; domain=.example.com; HttpOnly
Set-Cookie: blah = blahblah; expires=Sun, 18-Dec-2011 15:41:02 GMT; path=/; domain=.example.com; HttpOnly
Cache-Control: private, no-cache="set-cookie"
Expires: 0
Pragma: no-cache
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 4135
Keep-Alive: timeout=10, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=UTF-8
Normally I would code like this:
import mechanize
import urllib2
MechBrowser = mechanize.Browser()
LoginUrl = "http://example.com/login.html"
LoginData = "username=shane&password=123456&do=login"
LoginHeader = {"User-Agent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 GTB7.1 (.NET CLR 3.5.30729)", "Referer": "http://example.com"}
LoginRequest = urllib2.Request(LoginUrl, LoginData, LoginHeader)
LoginResponse = MechBrowser.open(LoginRequest)
Above code works fine. My question is, do I also need to add these following lines (and more in previous header infos) in LoginHeader
to make it really looks like firefox's surfing, not mechanize?
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
What parts/how many of header info need to be spoofed to make it looks "real"?
It depends on what you're trying to 'fool'. You can try some online services that do simple User Agent sniffing to gauge your success:
http://browserspy.dk/browser.php
http://www.browserscope.org (look for 'We think you're using...')
http://www.browserscope.org/ua
http://panopticlick.eff.org/ -> will help you to pick some 'too common to track' options
http://networking.ringofsaturn.com/Tools/browser.php
I believe a determined programmer could detect your game, but many log parsers and tools wouldn't once you echo what your real browser sends.
One thing you should consider is that lack of JS might raise red flags, so capture sent headers with JS disabled too.