I'm trying to get the unix time from a timestamp field in milliseconds (13 digits) but currently it returns in seconds (10 digits).
scala> var df = Seq("2017-01-18 11:00:00.000", "2017-01-18 11:00:00.123", "2017-01-18 11:00:00.882", "2017-01-18 11:00:02.432").toDF()
df: org.apache.spark.sql.DataFrame = [value: string]
scala> df = df.selectExpr("value timeString", "cast(value as timestamp) time")
df: org.apache.spark.sql.DataFrame = [timeString: string, time: timestamp]
scala> df = df.withColumn("unix_time", unix_timestamp(df("time")))
df: org.apache.spark.sql.DataFrame = [timeString: string, time: timestamp ... 1 more field]
scala> df.take(4)
res63: Array[org.apache.spark.sql.Row] = Array(
[2017-01-18 11:00:00.000,2017-01-18 11:00:00.0,1484758800],
[2017-01-18 11:00:00.123,2017-01-18 11:00:00.123,1484758800],
[2017-01-18 11:00:00.882,2017-01-18 11:00:00.882,1484758800],
[2017-01-18 11:00:02.432,2017-01-18 11:00:02.432,1484758802])
Even though 2017-01-18 11:00:00.123
and 2017-01-18 11:00:00.000
are different, I get the same unix time back 1484758800
What am I missing?
unix_timestamp()
return unix timestamp in seconds.
The last 3 digits in the timestamps are the same with the last 3 digits of the milliseconds string (1.999sec = 1999 milliseconds
), so just take the last 3 digits of the timestamps string and append to the end of the milliseconds string.