I have a dataframe df
:
df = pandas.DataFrame(pd.read_csv(loggerfile, header = 2))
values = df.as_matrix()
df2 = pd.DataFrame.from_records(values, index = datetimeIdx, columns = Columns)
EDIT:
Now reading the data this way as suggested:
df2 = pd.read_csv(loggerfile, header = None, skiprows = [0,1,2])
Sample:
0 1 2 3 4 5 6 7 8 \
0 2014-03-19T12:44:32.695Z 1395233072695 703425 0 2 1 13 5 21
1 2014-03-19T12:44:32.727Z 1395233072727 703425 0 2 1 13 5 21
9 10 11 12 13 14 15 16
0 25 0 25 209 0 145 0 0
1 25 0 25 209 0 146 0 0
The columns are all type int (except the first one):
print df2.dtypes
0 object
1 int64
2 int64
3 int64
4 int64
5 int64
6 int64
7 int64
8 int64
9 int64
10 int64
11 int64
12 int64
13 int64
14 int64
15 int64
16 int64
But in my correlation, some columns seem to be NaN.
df2.corr()
1 2 3 4 5 6 7 8 ...
1 1.000000 NaN 0.018752 -0.550307 NaN NaN 0.075191 0.775725
2 NaN NaN NaN NaN NaN NaN NaN NaN
3 0.018752 NaN 1.000000 -0.067293 NaN NaN -0.579651 0.004593
...
Those columns do not change in value right now, yes
As, Joris points out you would expected NaN
if the values do not vary. To see why take a look at correlation formula:
cor(i,j) = cov(i,j)/[stdev(i)*stdev(j)]
If the values of the ith or jth variable do not vary, then the respective standard deviation will be zero and so will the denominator of the fraction. Thus, the correlation will be NaN
.