Top "Spark-dataframe" questions

Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.

Programmatically generate the schema AND the data for a dataframe in Apache Spark

I would like to dynamically generate a dataframe containing a header record for a report, so creating a dataframe from …

apache-spark dataframe spark-dataframe rdd spark-csv
Array Intersection in Spark SQL

I have a table with a array type column named writer which has the values like array[value1, value2], array[…

apache-spark apache-spark-sql spark-dataframe hiveql apache-spark-dataset
Getting last value of group in Spark

I have a SparkR DataFrame as shown below: #Create R data.frame custId <- c(rep(1001, 5), rep(1002, 3), 1003) date <…

apache-spark pyspark spark-dataframe sparkr
Spark Dataframe validating column names for parquet writes (scala)

I'm processing events using Dataframes converted from a stream of JSON events which eventually gets written out as as Parquet …

apache-spark apache-spark-sql spark-streaming spark-dataframe parquet
Spark DataFrame: How to specify schema when writing as Avro

I want to write a DataFrame in Avro format using a provided Avro schema rather than Spark's auto-generated schema. How …

apache-spark spark-dataframe spark-avro
Spark DataFrame InsertIntoJDBC - TableAlreadyExists Exception

Using Spark 1.4.0, I am trying to insert data from a Spark DataFrame into a MemSQL database (which should be exactly …

mysql apache-spark spark-dataframe singlestore
Saving a dataframe result value to a string variable?

I created a dataframe in spark when find the max date I want to save it to the variable. Just …

python dataframe spark-dataframe pyspark-sql databricks
Spark: Read an inputStream instead of File

I'm using SparkSQL in a Java application to do some processing on CSV files using Databricks for parsing. The data …

java apache-spark apache-spark-sql spark-dataframe databricks
Normalize column with Spark

I have a data file with three columns, and I want to normalize the last column to apply ALS with …

scala apache-spark spark-dataframe apache-spark-ml normalize
Spark: group concat equivalent in scala rdd

I have following DataFrame: |-----id-------|----value------|-----desc------| | 1 | v1 | d1 | | 1 | v2 | d2 | | 2 | v21 | d21 | | 2 | v22 | d22 | |--------------|---------------|---------------| I want …

scala apache-spark group-concat rdd spark-dataframe