I'm generating some statistics for some English-language text and I would like to skip uninteresting words such as "a" and "the".
update: these are apparently called "stop words" and not "skip words".
The magic word to put into Google is "stop words". This turns up a reasonable-looking list.
MySQL also has a built-in list of stop words, but this is far too comprehensive to my tastes. For example, at our university library we had problems because "third" in "third world" was considered a stop word.