Day_Date,timeofday_desc,Timeofday_hour,Timeofday_minute,Timeofday_second,value
2017-12-18,12:21:02 AM,0,21,2,“1.779209040E+08”
2017-12-19,12:21:02 AM,0,21,2,“1.779209040E+08”
2017-12-20,12:30:52 AM,0,30,52,“1.779209040E+08”
2017-12-21,12:30:52 AM,0,30,52,“1.779209040E+08”
2017-12-22,12:47:10 AM,0,47,10,“1.779209040E+08”
2017-12-23,12:47:10 AM,0,47,10,“1.779209040E+08”
2017-12-24,02:46:59 AM,2,46,59,“1.779209040E+08”
2017-12-25,02:46:59 AM,2,46,59,“1.779209040E+08”
2017-12-26,03:10:27 AM,3,10,27,“1.779209040E+08”
2017-12-27,03:10:27 AM,3,10,27,“1.779209040E+08”
2017-12-28,03:52:08 AM,3,52,8,“1.779209040E+08”
I am trying to convert value
column to 177920904
val df1 = df.withColumn("s", 'value.cast("Decimal(10,4)")).drop("value").withColumnRenamed("s", "value")
also tried casting value as Float
, Double
. Always get null as output
df1.select("value").show()
+-----------+
| value |
+-----------+
| null|
| null|
| null|
| null|
| null|
| null|
| null|
| null|
df.printSchema
root
|-- Day_Date: string (nullable = true)
|-- timeofday_desc: string (nullable = true)
|-- Timeofday_hour: string (nullable = true)
|-- Timeofday_minute: string (nullable = true)
|-- Timeofday_second: string (nullable = true)
|-- value: string (nullable = true)
Just need to cast it to decimal with enough room to fit the number.
Decimal is Decimal(precision, scale), so Decimal(10, 4) means 10 digits in total, 6 at the left of the dot, and 4 to the right, so the number does not fit in your Decimal type.
From the documentation
precision represents the total number of digits that can be represented
scale represents the number of fractional digits. This value must be less than or equal to precision. A scale of 0 produces integral values, with no fractional part
Since you don't want any number to the right, you can try this
df.withColumn("s", 'value.cast("Decimal(10,0)"))
If you want to keep 4 decimal digits, you can just change it to
df.withColumn("s", 'value.cast("Decimal(14,4)"))
INPUT
df.show
+---------------+
| value|
+---------------+
|1.779209040E+08|
+---------------+
OUTPUT
scala> df.withColumn("s", 'value.cast("Decimal(10,0)")).show
+---------------+---------+
| value| s|
+---------------+---------+
|1.779209040E+08|177920904|
+---------------+---------+
FULL SOLUTION
Without dropping nor renamig
val df1 = df.withColumn("value", 'value.cast("Decimal(10,0)"))
FIX INPUT DATA
As I said in the comment, the problem is that your numbers contain some weird characters around them, you should remove it before casting
ORIGINAL
scala> df.show
+----------+--------------+--------------+----------------+----------------+-----------------+
| Day_Date|timeofday_desc|Timeofday_hour|Timeofday_minute|Timeofday_second| value|
+----------+--------------+--------------+----------------+----------------+-----------------+
|2017-12-18| 12:21:02 AM| 0| 21| 2| ?1.779209040E+08|
|2017-12-19| 12:21:02 AM| 0| 21| 2|?1.779209040E+08?|
|2017-12-20| 12:30:52 AM| 0| 30| 52| ?1.779209040E+08|
|2017-12-21| 12:30:52 AM| 0| 30| 52| ?1.779209040E+08|
|2017-12-22| 12:47:10 AM| 0| 47| 10| ?1.779209040E+08|
|2017-12-23| 12:47:10 AM| 0| 47| 10| ?1.779209040E+08|
|2017-12-24| 02:46:59 AM| 2| 46| 59| ?1.779209040E+08|
|2017-12-25| 02:46:59 AM| 2| 46| 59| ?1.779209040E+08|
|2017-12-26| 03:10:27 AM| 3| 10| 27| ?1.779209040E+08|
|2017-12-27| 03:10:27 AM| 3| 10| 27| ?1.779209040E+08|
|2017-12-28| 03:52:08 AM| 3| 52| 8| ?1.779209040E+08|
+----------+--------------+--------------+----------------+----------------+-----------------+
There are many ways to remove them, a quick one is with an UDF and a regular expression to remove everything but numbers, letters, dot, + and -
def clean(input: String) = input.replaceAll("[^a-zA-Z0-9\\+\\.-]", "")
val cleanUDF = udf(clean _ )
df.withColumn("value", cleanUDF($"value").cast(DecimalType(10,0))).show
+----------+--------------+--------------+----------------+----------------+---------+
| Day_Date|timeofday_desc|Timeofday_hour|Timeofday_minute|Timeofday_second| value|
+----------+--------------+--------------+----------------+----------------+---------+
|2017-12-18| 12:21:02 AM| 0| 21| 2|177920904|
|2017-12-19| 12:21:02 AM| 0| 21| 2|177920904|
|2017-12-20| 12:30:52 AM| 0| 30| 52|177920904|
|2017-12-21| 12:30:52 AM| 0| 30| 52|177920904|
|2017-12-22| 12:47:10 AM| 0| 47| 10|177920904|
|2017-12-23| 12:47:10 AM| 0| 47| 10|177920904|
|2017-12-24| 02:46:59 AM| 2| 46| 59|177920904|
|2017-12-25| 02:46:59 AM| 2| 46| 59|177920904|
|2017-12-26| 03:10:27 AM| 3| 10| 27|177920904|
|2017-12-27| 03:10:27 AM| 3| 10| 27|177920904|
|2017-12-28| 03:52:08 AM| 3| 52| 8|177920904|
+----------+--------------+--------------+----------------+----------------+---------+