Top "Spark-dataframe" questions

Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.

Programmatically generate the schema AND the data for a dataframe in Apache Spark

I would like to dynamically generate a dataframe containing a header record for a report, so creating a dataframe from …

apache-spark dataframe spark-dataframe rdd spark-csv
Array Intersection in Spark SQL

I have a table with a array type column named writer which has the values like array[value1, value2], array[…

apache-spark apache-spark-sql spark-dataframe hiveql apache-spark-dataset
Getting last value of group in Spark

I have a SparkR DataFrame as shown below: #Create R data.frame custId <- c(rep(1001, 5), rep(1002, 3), 1003) date <…

apache-spark pyspark spark-dataframe sparkr
Spark Dataframe validating column names for parquet writes (scala)

I'm processing events using Dataframes converted from a stream of JSON events which eventually gets written out as as Parquet …

apache-spark apache-spark-sql spark-streaming spark-dataframe parquet
Spark DataFrame: How to specify schema when writing as Avro

I want to write a DataFrame in Avro format using a provided Avro schema rather than Spark's auto-generated schema. How …

apache-spark spark-dataframe spark-avro
Spark DataFrame InsertIntoJDBC - TableAlreadyExists Exception

Using Spark 1.4.0, I am trying to insert data from a Spark DataFrame into a MemSQL database (which should be exactly …

mysql apache-spark spark-dataframe singlestore
Saving a dataframe result value to a string variable?

I created a dataframe in spark when find the max date I want to save it to the variable. Just …

python dataframe spark-dataframe pyspark-sql databricks