So far I am able to detect robots from a list of user agent string by matching these strings to known user agents, but I was wondering what other methods there are to do this using php as I am retrieving fewer bots than expected using this method.
I am also looking to find out how to detect if a browser or robot is spoofing another browser using a user agent string.
Any advice is appreciated.
EDIT: This has to be done using a log file with lines as follows:
129.173.129.168 - - [11/Oct/2011:00:00:05 -0300] "GET /cams/uni_ave2.jpg?time=1318302291289 HTTP/1.1" 200 20240 "http://faculty.dentistry.dal.ca/loanertracker/webcam.html" "Mozilla/5.0 (Macintosh; U; PPC Mac OS X 10.4; en-US; rv:1.9.2.23) Gecko/20110920 Firefox/3.6.23"
This means I can't check user behaviour aside from access times.
In addition to filtering key words in the user agent string, I have had luck with putting a hidden honeypot link on all pages:
<a style="display:none" href="autocatch.php">A</a>
Then in "autocatch.php" record the session (or IP address) as a bot. This link is invisible to users but it's hidden characteristic would hopefully not be realized by bots. Taking the style attribute out and putting it into a CSS file might help even more.