Stemming Pandas Dataframe 'float' object has no attribute 'split'

python pandas dataframe stem

Ashfaq Ali Shafin · Nov 7, 2017 · Viewed 7.3k times · Source

import pandas as pd
from nltk.stem import PorterStemmer, WordNetLemmatizer
porter_stemmer = PorterStemmer()

df = pd.read_csv("last1.csv",sep=',',header=0,encoding='utf-8')

df['rev'] = df['reviewContent'].apply(lambda x : filter(None,x.split(" ")))

Dataset

I am trying to stem my dataframe. While tokenizing I am getting this error for

df['rev'] = df['reviewContent'].apply(lambda x : filter(None,x.split(" ")))

AttributeError: 'float' object has no attribute 'split'

While using Stemming I also get the float problem

df['reviewContent'] = df["reviewContent"].apply(lambda x: [stemmer.stem(y) for y in x])

TypeError: 'float' object is not iterable

What can I do?

Answer

When tokenising your data, you don't need the apply call. str.split should do just fine. Also, you can split on multiple whitespace, so you don't have to look for empty strings.

df['rev'] = df['reviewContent'].astype(str).str.split()

You can now run your stemmer as before:

df['rev'] = df['rev'].apply(lambda x: [stemmer.stem(y) for y in x])

Stemming Pandas Dataframe 'float' object has no attribute 'split'

Answer

Related questions