Top "Apache-spark-dataset" questions

Spark Dataset is a strongly typed collection of objects mapped to a relational schema.

How to name aggregate columns?

I'm using Spark in Scala and my aggregated columns are anonymous. Is there a convenient way to rename multiple columns …

scala apache-spark apache-spark-dataset
Difference between SparkContext, JavaSparkContext, SQLContext, and SparkSession?

What is the difference between SparkContext, JavaSparkContext, SQLContext and SparkSession? Is there any method to convert or create a Context …

java scala apache-spark rdd apache-spark-dataset
Spark Dataframes- Reducing By Key

Let's say I have a data structure like this where ts is some timestamp case class Record(ts: Long, id: …

scala apache-spark apache-spark-sql apache-spark-dataset
Ho to read ".gz" compressed file using spark DF or DS?

I have a compressed file with .gz format, Is it possible to read the file directly using spark DF/DS? …

apache-spark apache-spark-sql spark-dataframe gzip apache-spark-dataset
Spark DataSet filter performance

I have been experimenting different ways to filter a typed data set. It turns out the performance can be quite …

apache-spark apache-spark-sql spark-dataframe apache-spark-dataset
How to use both dataset.select and selectExpr in apache spark

I want below mentioned data using Spark (2.2) dataset Name Age Age+5 A 10 15 B 5 10 C 25 30 I tried using the following : dataset.…

apache-spark apache-spark-dataset
Spark Error: Unable to find encoder for type stored in a Dataset

I am using Spark on a Zeppelin notebook, and groupByKey() does not seem to be working. This code: df.groupByKey(…

scala apache-spark apache-spark-dataset apache-spark-encoders
How to lower the case of column names of a data frame but not its values?

How to lower the case of column names of a data frame but not its values? using RAW Spark SQL …

apache-spark apache-spark-sql apache-spark-dataset
Create Spark Dataset from a CSV file

I would like to create a Spark Dataset from a simple CSV file. Here are the contents of the CSV …

apache-spark apache-spark-dataset
Create DataFrame with null value for few column

I am trying to create a DataFrame using RDD. First I am creating a RDD using below code - val …

scala apache-spark spark-dataframe apache-spark-dataset