How to convert string into float value in the dataframe

Ashim Sinha picture Ashim Sinha · May 8, 2015 · Viewed 26.3k times · Source

We are facing an error when we have a column which have datatype as string and the value like col1 col2 1 .89

So, when we are using

def azureml_main(dataframe1 = None, dataframe2 = None):

    # Execution logic goes here
    print('Input pandas.DataFrame #1:')
    import pandas as pd
    import numpy as np
    from sklearn.kernel_approximation import RBFSampler
    x =dataframe1.iloc[:,2:1080]
    print x
    df1 = dataframe1[['colname']]

    change = np.array(df1)
    b = change.ravel()
    print b
    rbf_feature = RBFSampler(gamma=1, n_components=100,random_state=1)
    print rbf_feature
    print "test"
    X_features = rbf_feature.fit_transform(x)

After this we are getting error as cannt convert non int into type float

Answer

EdChum picture EdChum · May 8, 2015

Use astype(float) e.g.:

df['col'] = df['col'].astype(float)

or convert_objects:

df = df.convert_objects(convert_numeric=True)

Example:

In [379]:

df = pd.DataFrame({'a':['1.23', '0.123']})
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 1 columns):
a    2 non-null object
dtypes: object(1)
memory usage: 32.0+ bytes
In [380]:

df['a'].astype(float)
Out[380]:
0    1.230
1    0.123
Name: a, dtype: float64

In [382]:

df = df.convert_objects(convert_numeric=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 1 columns):
a    2 non-null float64
dtypes: float64(1)
memory usage: 32.0 bytes

UPDATE

If you're running version 0.17.0 or later then convert_objects has been replaced with the methods: to_numeric, to_datetime, and to_timestamp so instead of:

df['col'] = df['col'].astype(float)

you can do:

df['col'] = pd.to_numeric(df['col'])

note that by default any non convertible values will raise an error, if you want these to be forced to NaN then do:

df['col'] = pd.to_numeric(df['col'], errors='coerce')