"Stop words" list for English?

language-agnostic indexing filtering stop-words nlp

Mark Harrison · Aug 2, 2009 · Viewed 20.2k times · Source

I'm generating some statistics for some English-language text and I would like to skip uninteresting words such as "a" and "the".

Where can I find some lists of these uninteresting words?
Is a list of these words the same as a list of the most frequently used words in English?

update: these are apparently called "stop words" and not "skip words".

Answer

The magic word to put into Google is "stop words". This turns up a reasonable-looking list.

MySQL also has a built-in list of stop words, but this is far too comprehensive to my tastes. For example, at our university library we had problems because "third" in "third world" was considered a stop word.

"Stop words" list for English?

Answer

Related questions