Top "Apache-spark-dataset" questions

Spark Dataset is a strongly typed collection of objects mapped to a relational schema.

Add UUID to spark dataset

I am trying to add a UUID column to my dataset. getDataset(Transaction.class)).withColumn("uniqueId", functions.lit(UUID.randomUUID().…

apache-spark apache-spark-dataset spark-csv
How can I add a column with a value to a new Dataset in Spark Java?

So, I'm creating some Datasets from the java Spark API. These datasets are populated from hive table, using the spark.…

java apache-spark dataset apache-spark-dataset bigdata
Overwrite only some partitions in a partitioned spark Dataset

How can we overwrite a partitioned dataset, but only the partitions we are going to change? For example, recomputing last …

apache-spark hive apache-spark-dataset
Mapping Spark DataSet row values into new hash column

Given the following DataSet values as inputData: column0 column1 column2 column3 A 88 text 99 Z 12 test 200 T 120 foo 12 In Spark, what …

scala apache-spark spark-dataframe apache-spark-dataset
Spark Dataset select with typedcolumn

Looking at the select() function on the spark DataSet there are various generated function signatures: (c1: TypedColumn[MyClass, U1],c2: …

scala apache-spark apache-spark-dataset
Perform a typed join in Scala with Spark Datasets

I like Spark Datasets as they give me analysis errors and syntax errors at compile time and also allow me …

scala apache-spark join apache-spark-sql apache-spark-dataset
How to drop malformed rows while reading csv with schema Spark?

While I am using Spark DataSet to load a csv file. I prefer designating schema clearly. But I find there …

apache-spark apache-spark-dataset
Spark 2 Dataset Null value exception

Getting this null error in spark Dataset.filter Input CSV: name,age,stat abc,22,m xyz,,s Working code: case …

scala apache-spark apache-spark-sql apache-spark-dataset
How to read multiple Excel files and concatenate them into one Apache Spark DataFrame?

Recently I wanted to do Spark Machine Learning Lab from Spark Summit 2016. Training video is here and exported notebook is …

excel scala apache-spark apache-spark-dataset spark-excel
Array Intersection in Spark SQL

I have a table with a array type column named writer which has the values like array[value1, value2], array[…

apache-spark apache-spark-sql spark-dataframe hiveql apache-spark-dataset