I'm working with both R and Python and I want to write one of my pandas DataFrames as a feather so I can work with it more easily in R. However, when I try to write it as a feather, I get the following error:
ArrowInvalid: trying to convert NumPy type float64 but got float32
I doubled checked my column types and they are already float 64:
In[1]
df.dtypes
Out[1]
id Object
cluster int64
vector_x float64
vector_y float64
I get the same error regardless of using feather.write_dataframe(df, "path/df.feather")
or df.to_feather("path/df.feather")
.
I saw this on GitHub but didn't understand if it was related or not: https://issues.apache.org/jira/browse/ARROW-1345 and https://github.com/apache/arrow/issues/1430
In the end, I can just save it as a csv and change the columns in R (or just do the whole analysis in Python), but I was hoping to use this.
Edit 1:
Still having the same issue despite the great advice below so updating what I've tried.
df[['vector_x', 'vector_y', 'cluster']] = df[['vector_x', 'vector_y', 'cluster']].astype(float)
df[['doc_id', 'text']] = df[['doc_id', 'text']].astype(str)
df[['doc_vector', 'doc_vectors_2d']] = df[['doc_vector', 'doc_vectors_2d']].astype(list)
df.dtypes
Out[1]:
doc_id object
text object
doc_vector object
cluster float64
doc_vectors_2d object
vector_x float64
vector_y float64
dtype: object
Edit 2:
After much searching, it appears that the issue is that my cluster column is a list type made up of int64 integers. So I guess the real quest is, does feather format support lists?
Edit 3:
Just to tie this in a bow, feather does not support nested data types like lists, at least not yet.
The problem in your case is the id Object
column. These are Python objects and they cannot represented in a language neutral format. This feather (actually the underlying Apache Arrow / pyarrow
) is trying to guess the DataType of the id
column. The guess is done on the first objects it sees in the column. These are float64
numpy scalars. Later, you have float32
scalars. Instead of coercing them to some type, Arrow is more strict with types and fails.
You should be able to work around this problem by ensuring that all columns have a non-object dtype with df['id'] = df['id'].astype(float)
.