How to remove list of words from a list of strings

prabhu picture prabhu · Aug 18, 2010 · Viewed 17.8k times · Source

Sorry if the question is bit confusing. This is similar to this question

I think this the above question is close to what I want, but in Clojure.

There is another question

I need something like this but instead of '[br]' in that question, there is a list of strings that need to be searched and removed.

Hope I made myself clear.

I think that this is due to the fact that strings in python are immutable.

I have a list of noise words that need to be removed from a list of strings.

If I use the list comprehension, I end up searching the same string again and again. So, only "of" gets removed and not "the". So my modified list looks like this

places = ['New York', 'the New York City', 'at Moscow' and many more]

noise_words_list = ['of', 'the', 'in', 'for', 'at']

for place in places:
    stuff = [place.replace(w, "").strip() for w in noise_words_list if place.startswith(w)]

I would like to know as to what mistake I'm doing.

Answer

Tony Veijalainen picture Tony Veijalainen · Aug 18, 2010

Without regexp you could do like this:

places = ['of New York', 'of the New York']

noise_words_set = {'of', 'the', 'at', 'for', 'in'}
stuff = [' '.join(w for w in place.split() if w.lower() not in noise_words_set)
         for place in places
         ]
print stuff