I'm interested in getting access to a full WHOIS database in order to expand on a domain-profile project I'm working on. I know ARIN provides this database only to non-commercial researchers and every WHOIS provider I know of (including ARIN itself) has rate-limiting.
I also know, however, some commercial services that already exist (like the registrant lookup section of domaintools.com, which can search for domains by registrant name) which are impossible unless the site has direct access to a cached copy of the WHOIS database.
Any idea how they got ahold of their data?
There is no publicly known single complete "WHOIS database", if you mean it.
ICANN organizes and keeps information about which registry is responsible for each top level domain and which registrars are entitled for which tlds (e.g. com, net, co.uk, ...).
Registries (e.g. Verisign, national registries, ...) keep the general availability information for domains of their allotted tlds. Some generic and country tld registries keep detailed whois information too, as the registrar system has not yet been (or will not be) fully developed for them.
And finally, registrars (GoDaddy, Enom, country code registrars where applicable, ...) have the detailed whois information about the domains registered through them.
So which one of these do you need?
For example Verisign, the biggest registry who is responsible for com/net/... has no rate limit AFAIK, or at least I've never encountered it up to ~40k queries/day.