Top "Apache-spark-dataset" questions

Spark Dataset is a strongly typed collection of objects mapped to a relational schema.

How to traverse/iterate a Dataset in Spark Java?

I am trying to traverse a Dataset to do some string similarity calculations like Jaro winkler or Cosine Similarity. I …

java apache-spark iterator apache-spark-2.0 apache-spark-dataset
How to get keys and values from MapType column in SparkSQL DataFrame

I have data in a parquet file which has 2 fields: object_id: String and alpha: Map<>. It is …

scala apache-spark dataframe apache-spark-sql apache-spark-dataset
Convert scala list to DataFrame or DataSet

I am new to Scala. I am trying to convert a scala list (which is holding the results of some …

scala apache-spark apache-spark-sql apache-spark-dataset apache-spark-encoders
How to create a Spark Dataset from an RDD

I have an RDD[LabeledPoint] intended to be used within a machine learning pipeline. How do we convert that RDD …

scala apache-spark dataset apache-spark-dataset
Why is the error "Unable to find encoder for type stored in a Dataset" when encoding JSON using case classes?

I've written spark job: object SimpleApp { def main(args: Array[String]) { val conf = new SparkConf().setAppName("Simple Application").setMaster("local") …

scala apache-spark apache-spark-dataset apache-spark-encoders
How to create a custom Encoder in Spark 2.X Datasets?

Spark Datasets move away from Row's to Encoder's for Pojo's/primitives. The Catalyst engine uses an ExpressionEncoder to convert columns …

scala apache-spark apache-spark-dataset apache-spark-encoders
Encoder for Row Type Spark Datasets

I would like to write an encoder for a Row type in DataSet, for a map operation that I am …

java apache-spark apache-spark-sql apache-spark-dataset apache-spark-encoders
How to convert DataFrame to Dataset in Apache Spark in Java?

I can convert DataFrame to Dataset in Scala very easy: case class Person(name:String, age:Long) val df = ctx.…

java apache-spark spark-dataframe apache-spark-dataset
Partition data for efficient joining for Spark dataframe/dataset

I need to join many DataFrames together based on some shared key columns. For a key-value RDD, one can specify …

apache-spark apache-spark-sql spark-dataframe partitioning apache-spark-dataset
DataFrame / Dataset groupBy behaviour/optimization

Suppose we have DataFrame df consisting of the following columns: Name, Surname, Size, Width, Length, Weigh Now we want to …

performance apache-spark dataframe apache-spark-sql apache-spark-dataset