Dataset<Tweet> ds = sc.read().json("/path").as(Encoders.bean(Tweet.class));
Tweet class :-
long id
string user;
string text;
ds.printSchema();
Output:-
root
|-- id: string (nullable = true)
|-- text: string (nullable = true)
|-- user: string (nullable = true)
json file has all arguments of string type
My question is am taking input and encoding it as Tweet.class
.The datatype specified for id in the schema is Long but when schema is printed it is cast to String
.
Does it give printscheme a/c to how it reads the file or according to encoding we do (here Tweet.class)?
i don't know the exact reason why your code is not working, but if you want to change the filed type you can write your customSchema.
val schema = StructType(List
(
StructField("id", LongType, nullable = true),
StructField("text", StringType, nullable = true),
StructField("user", StringType, nullable = true)
)))
you can apply schema to your dataframe as follows:
Dataset<Tweet> ds = sc.read().schema(schema).json("/path")
ds.printSchema()