How to detect browser spoofing and robots from a user agent string in php

user1422508 picture user1422508 · Nov 14, 2012 · Viewed 14.1k times · Source

So far I am able to detect robots from a list of user agent string by matching these strings to known user agents, but I was wondering what other methods there are to do this using php as I am retrieving fewer bots than expected using this method.

I am also looking to find out how to detect if a browser or robot is spoofing another browser using a user agent string.

Any advice is appreciated.

EDIT: This has to be done using a log file with lines as follows:

129.173.129.168 - - [11/Oct/2011:00:00:05 -0300] "GET /cams/uni_ave2.jpg?time=1318302291289 HTTP/1.1" 200 20240 "http://faculty.dentistry.dal.ca/loanertracker/webcam.html" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.23) Gecko/20110920 Firefox/3.6.23"

This means I can't check user behaviour aside from access times.

Answer

laifukang picture laifukang · Nov 14, 2012

In addition to filtering key words in the user agent string, I have had luck with putting a hidden honeypot link on all pages:

<a style="display:none" href="autocatch.php">A</a>

Then in "autocatch.php" record the session (or IP address) as a bot. This link is invisible to users but it's hidden characteristic would hopefully not be realized by bots. Taking the style attribute out and putting it into a CSS file might help even more.