Printschema() in Apache Spark

Question 1

Printschema() in Apache Spark

apache-spark spark-dataframe apache-spark-dataset

rushikesh jachak · Apr 30, 2018 · Viewed 43.7k times · Source

Answer

Answer

i don't know the exact reason why your code is not working, but if you want to change the filed type you can write your customSchema.

val schema =  StructType(List
                        (
                          StructField("id", LongType, nullable = true),
                          StructField("text", StringType, nullable = true),
                          StructField("user", StringType, nullable = true)
                        )))

you can apply schema to your dataframe as follows:

Dataset<Tweet> ds = sc.read().schema(schema).json("/path")

ds.printSchema()

Question 2

Dataset<Tweet> ds = sc.read().json("/path").as(Encoders.bean(Tweet.class));



Tweet class :-
long id
string user;
string text;


ds.printSchema();

Output:-

root
  |-- id: string (nullable = true)
  |-- text: string (nullable = true)  
  |-- user: string (nullable = true)

json file has all arguments of string type

My question is am taking input and encoding it as Tweet.class .The datatype specified for id in the schema is Long but when schema is printed it is cast to String.

Does it give printscheme a/c to how it reads the file or according to encoding we do (here Tweet.class)?

Printschema() in Apache Spark

Answer

Related questions