ValueError: could not convert string to float, Python

Anagha picture Anagha · Mar 21, 2017 · Viewed 11.7k times · Source

I have a dataframe on which i'm trying to implement feature selection. There are 45 columns of types, integer, float and object.

But I'm unable to fit any feature selection model since its throwing vale Error. Please help me out

Dataframe :

member_id   loan_amnt   funded_amnt funded_amnt_inv term        batch_enrolled   int_rate   grade
58189336    14350       14350       14350           36 months                    19.19      E
70011223    4800        4800        4800            36 months   BAT1586599       10.99      B

 sub_grade  emp_title   emp_length  home_ownership  annual_inc  verification_status pymnt_plan  desc                purpose title      zip_code addr_state   dti
 E3         clerk       9 years     OWN             28700       Source Verified     n           debt_consolidation  Debt consolidation 349xx    FL        33.88
 B4         HR          < 1 year    MORTGAGE        65000       Source Verified     n           home_improvement    Home improvement    209xx   MD      3.64

 last_week_pay  loan_status
 44th week          0
 9th week           1

Code:

 import numpy
 from pandas import read_csv
 from sklearn.decomposition import PCA
 # load data
 df = pd.read_csv("C:/Users/anagha/Documents/Python  Scripts/train_indessa.csv")
 array = df.values
 X = array[:,0:44]
 Y = array[:,44]
 # feature extraction
 pca = PCA(n_components=3)
 fit = pca.fit(X)

Error:

 Traceback (most recent call last):

 File "<ipython-input-8-20f3863fd66e>", line 2, in <module>
 fit = pca.fit(X)

 File "C:\Users\anagha\Anaconda3\lib\site-  packages\sklearn\decomposition\pca.py", line 301, in fit
self._fit(X)

File "C:\Users\anagha\Anaconda3\lib\site-packages\sklearn\decomposition\pca.py", line 333, in _fit
copy=self.copy)

File "C:\Users\anagha\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 382, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)

ValueError: could not convert string to float: '44th week'

Answer

Miriam Farber picture Miriam Farber · Mar 21, 2017

You cannot fit PCA on a non-numeric data. PCA involves matrix decomposition, and since some of your data is not numeric, you cannot apply PCA on it. So in order to proceed with PCA you should either ignore non-numeric columns , or transforming them into numeric columns.