I have a dataframe on which i'm trying to implement feature selection. There are 45 columns of types, integer, float and object.
But I'm unable to fit any feature selection model since its throwing vale Error. Please help me out
Dataframe :
member_id loan_amnt funded_amnt funded_amnt_inv term batch_enrolled int_rate grade
58189336 14350 14350 14350 36 months 19.19 E
70011223 4800 4800 4800 36 months BAT1586599 10.99 B
sub_grade emp_title emp_length home_ownership annual_inc verification_status pymnt_plan desc purpose title zip_code addr_state dti
E3 clerk 9 years OWN 28700 Source Verified n debt_consolidation Debt consolidation 349xx FL 33.88
B4 HR < 1 year MORTGAGE 65000 Source Verified n home_improvement Home improvement 209xx MD 3.64
last_week_pay loan_status
44th week 0
9th week 1
Code:
import numpy
from pandas import read_csv
from sklearn.decomposition import PCA
# load data
df = pd.read_csv("C:/Users/anagha/Documents/Python Scripts/train_indessa.csv")
array = df.values
X = array[:,0:44]
Y = array[:,44]
# feature extraction
pca = PCA(n_components=3)
fit = pca.fit(X)
Error:
Traceback (most recent call last):
File "<ipython-input-8-20f3863fd66e>", line 2, in <module>
fit = pca.fit(X)
File "C:\Users\anagha\Anaconda3\lib\site- packages\sklearn\decomposition\pca.py", line 301, in fit
self._fit(X)
File "C:\Users\anagha\Anaconda3\lib\site-packages\sklearn\decomposition\pca.py", line 333, in _fit
copy=self.copy)
File "C:\Users\anagha\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 382, in check_array
array = np.array(array, dtype=dtype, order=order, copy=copy)
ValueError: could not convert string to float: '44th week'
You cannot fit PCA on a non-numeric data. PCA involves matrix decomposition, and since some of your data is not numeric, you cannot apply PCA on it. So in order to proceed with PCA you should either ignore non-numeric columns , or transforming them into numeric columns.