I'm very curious to know how this process works. These sites (http://www.sharkscope.com and http://www.pokertableratings.com) data mine thousands of hands per day from secure poker networks, such as PokerStars and Full Tilt.
Do they have a farm of servers running applications that open hundreds of tables (windows) and then somehow spider/datamine the hands that are being played?
How does this work, programming wise?
There are a few options. I've been researching it since I wanted to implement some of this functionality in a web app I'm working on. I'll use PokerStars for example, since they have, by far, the best security of any online poker site.
First, realize that there is no way for a developer to rip real time information from the PokerStars application itself. You can't access the API. You can, though, do the following:
Screen Scraping/OCR
PokerStars does its best to sabotage screen/text scraping of their application (by doing simple things like pixel level color fluctuations) but with enough motivation you can easily get around this. Google AutoHotkey combined with ImageSearch.
API Access and XML Feeds
PokerStars doesn't offer public access to its API. But it does offer an XML feed to developers who are pre-approved. This XML feed offers:
PokerStars Site Summary - shows player, table, and tournament counts
PokerStars Current Tournament data - files with information about upcoming and active tournaments. The data is provided in two files:
PokerStars Tournament Results - provides information about completed tournaments. The data is provided in two files:
PokerStars Tournament Leaders Board - provides information about top PokerStars players ranked using PokerStars Tournament Ranking System
PokerStars Tournament Leaders Board BOP - provides information about top PokerStars players ranked using PokerStars Battle Of Planets Ranking System
Team PokerStars – provides information about Team PokerStars players and their online activity
It's highly unlikely that these sites have access to the XML feed (or an improved one which would provide all the functionality they need) since PokerStars isn't exactly on good terms with most of these sites.
This leaves two options. Scraping the network connection for said data, which I think is borderline impossible (I don't have experience with this so I'm not sure; I've heard it's highly encrypted and not easy to tinker with, but I'm not sure) and, mentioned above, screen scraping/OCR.
Option #2 is easy enough to implement and, with some work, can avoid detection. From what I've been able to gather, this is the only way they could be doing such massive data mining of PokerStars (I haven't looked into other sites but I've heard security on anything besides PokerStars/Full Tilt is quite horrendous).
[edit] Reread your question and realized I didn't unambiguously answer it.
Yes, they likely have a massive amount of servers running watching all currently running tables, tournaments, etc. Realize that there is a decent amount of money in what they're doing.
This, for instance, could be how they do it (speculation):
Said bot applications watch the tables and data mine all information that gets "posted" to the chat log. They do this by already having a table of images that correspond to, for example, all letters of the alphabet (since PokerStars doesn't post their text as... text. All text in their software is actually an image). So, the bot then rips an image of the chat log, matches it against the store, converts the data to a format they can work with, and throws it in a database. Done.
[edit] No, the data isn't sold to them by the poker sites themselves. This would be a PR nightmare if it ever got out, which it would. And it wouldn't account for the functionality of these sites, which appears to be instantaneous. OPR, Sharkscope, etc. There are, without a doubt, applications running that are ripping the data real time from the poker software, likely using the methods I listed.