I want to normalize my data and compute a pearson correlation. If I try this without normalization it works. With normalization I get this error message: AttributeError: 'numpy.ndarray' object has no attribute 'corr' What can I do to solve this problem?
import numpy as np
import pandas as pd
filename_train = 'C:\Users\xxx.xxx\workspace\Dataset\!train_data.csv'
names = ['a', 'b', 'c', 'd', 'e', ...]
df_train = pd.read_csv(filename_train, names=names)
from sklearn.preprocessing import Normalizer
normalizeddf_train = Normalizer().fit_transform(df_train)
#pearson correlation
pd.set_option('display.width', 100)
pd.set_option('precision', 2)
print(normalizeddf_train.corr(method='pearson'))
You need DataFrame
constructor, because output of fit_transform
is numpy array
and work with DataFrame.corr
:
df_train = pd.DataFrame({'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9],
'D':[1,3,5],
'E':[5,3,6],
'F':[7,4,3]})
print (df_train)
A B C D E F
0 1 4 7 1 5 7
1 2 5 8 3 3 4
2 3 6 9 5 6 3
from sklearn.preprocessing import Normalizer
normalizeddf_train = Normalizer().fit_transform(df_train)
print (normalizeddf_train)
[[ 0.08421519 0.33686077 0.58950634 0.08421519 0.42107596 0.58950634]
[ 0.1774713 0.44367825 0.70988521 0.26620695 0.26620695 0.3549426 ]
[ 0.21428571 0.42857143 0.64285714 0.35714286 0.42857143 0.21428571]]
print(pd.DataFrame(normalizeddf_train).corr(method='pearson'))
0 1 2 3 4 5
0 1.000000 0.917454 0.646946 0.998477 -0.203152 -0.994805
1 0.917454 1.000000 0.896913 0.894111 -0.575930 -0.872187
2 0.646946 0.896913 1.000000 0.603899 -0.878063 -0.565959
3 0.998477 0.894111 0.603899 1.000000 -0.148832 -0.998906
4 -0.203152 -0.575930 -0.878063 -0.148832 1.000000 0.102420
5 -0.994805 -0.872187 -0.565959 -0.998906 0.102420 1.000000