I want to convert a string column of a data frame to a list. What I can find from the Dataframe
API is RDD, so I tried converting it back to RDD first, and then apply toArray
function to the RDD. In this case, the length and SQL work just fine. However, the result I got from RDD has square brackets around every element like this [A00001]
. I was wondering if there's an appropriate way to convert a column to a list or a way to remove the square brackets.
Any suggestions would be appreciated. Thank you!
This should return the collection containing single list:
dataFrame.select("YOUR_COLUMN_NAME").rdd.map(r => r(0)).collect()
Without the mapping, you just get a Row object, which contains every column from the database.
Keep in mind that this will probably get you a list of Any type. Ïf you want to specify the result type, you can use .asInstanceOf[YOUR_TYPE] in r => r(0).asInstanceOf[YOUR_TYPE]
mapping
P.S. due to automatic conversion you can skip the .rdd
part.