Saving a dataframe result value to a string variable?

oharr picture oharr · Apr 20, 2018 · Viewed 7.9k times · Source

I created a dataframe in spark when find the max date I want to save it to the variable. Just trying to figure out how to get the result, which is a string, and save it to a variable.

code so far:

sqlDF = spark.sql("SELECT MAX(date) FROM account")
sqlDF.show()

what results look likes:

+--------------------+
| max(date)|
+--------------------+
|2018-04-19T14:11:...|
+--------------------+

thanks

Answer

Josh Rosen picture Josh Rosen · Apr 20, 2018

Assuming you're computing a global aggregate (where the output will have a single row) and are using PySpark, the following should work:

spark.sql("SELECT MAX(date) as maxDate FROM account").first()["maxDate"]

I believe this will return a datetime object but you can either convert that to a string in your driver code or do a SELECT CAST(MAX(DATE) as string) instead.