I am using pyspark 2.0 to create a DataFrame object by reading a csv using:
data = spark.read.csv('data.csv', header=True)
I find the type of the data using
type(data)
The result is
pyspark.sql.dataframe.DataFrame
I am trying to convert the some columns in data to LabeledPoint in order to apply a classification.
from pyspark.sql.types import *
from pyspark.sql.functions import loc
from pyspark.mllib.regression import LabeledPoint
data.select(['label','features']).
map(lambda row:LabeledPoint(row.label, row.features))
I came across with this problem:
AttributeError: 'DataFrame' object has no attribute 'map'
Any idea on the error? Is there a way to generate a LabelPoint from DataFrame in order to perform classification?
Use .rdd.map
:
>>> data.select(...).rdd.map(...)
DataFrame.map
has been removed in Spark 2.