get datatype of column using pyspark

Sreenuvasulu picture Sreenuvasulu · Jul 11, 2017 · Viewed 89.1k times · Source

We are reading data from MongoDB Collection. Collection column has two different values (e.g.: (bson.Int64,int) (int,float) ).

I am trying to get a datatype using pyspark.

My problem is some columns have different datatype.

Assume quantity and weight are the columns

quantity           weight
---------          --------
12300              656
123566000000       789.6767
1238               56.22
345                23
345566677777789    21

Actually we didn't defined data type for any column of mongo collection.

When I query to the count from pyspark dataframe

dataframe.count()

I got exception like this

"Cannot cast STRING into a DoubleType (value: BsonString{value='200.0'})"

Answer

eliasah picture eliasah · Jul 11, 2017

Your question is broad, thus my answer will also be broad.

To get the data types of your DataFrame columns, you can use dtypes i.e :

>>> df.dtypes
[('age', 'int'), ('name', 'string')]

This means your column age is of type int and name is of type string.