I have some issues with Pandas and NLTK. I am new at programming, so excuse me if i ask questions that might be easy to solve. I have a csv file which has 3 columns(Id,Title,Body) and about 15.000 rows.
My goal is to remove the stopwords from this csv file. The operation for lowercase and split are working well. But i can not find my mistake why the stopwords does not get removed. What am i missing?
import pandas as pd
from nltk.corpus import stopwords
pd.read_csv("test10in.csv", encoding="utf-8")
df = pd.read_csv("test10in.csv")
df.columns = ['Id','Title','Body']
df['Title'] = df['Title'].str.lower().str.split()
df['Body'] = df['Body'].str.lower().str.split()
stop = stopwords.words('english')
df['Title'].apply(lambda x: [item for item in x if item not in stop])
df['Body'].apply(lambda x: [item for item in x if item not in stop])
df.to_csv("test10out.csv")
you are trying to do an inplace replace. you should do
df['Title'] = df['Title'].apply(lambda x: [item for item in x if item not in stop])
df['Body'] = df['Body'].apply(lambda x: [item for item in x if item not in stop])