Top "Apache-spark-sql" questions

Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.

How to delete columns in pyspark dataframe

>>> a DataFrame[id: bigint, julian_date: string, user_id: bigint] >>> b DataFrame[id: bigint, …

apache-spark apache-spark-sql pyspark
Convert pyspark string to date format

I have a date pyspark dataframe with a string column in the format of MM-dd-yyyy and I am attempting to …

apache-spark pyspark apache-spark-sql pyspark-sql
Extract column values of Dataframe as List in Apache Spark

I want to convert a string column of a data frame to a list. What I can find from the …

scala apache-spark apache-spark-sql
Renaming column names of a DataFrame in Spark Scala

I am trying to convert all the headers / column names of a DataFrame in Spark-Scala. as of now I come …

scala apache-spark dataframe apache-spark-sql
how to filter out a null value from spark dataframe

I created a dataframe in spark with the following schema: root |-- user_id: long (nullable = false) |-- event_id: …

scala apache-spark apache-spark-sql spark-dataframe
How to save DataFrame directly to Hive?

Is it possible to save DataFrame in spark directly to Hive? I have tried with converting DataFrame to Rdd and …

scala apache-spark hive apache-spark-sql
How to add a constant column in a Spark DataFrame?

I want to add a column in a DataFrame with some arbitrary value (that is the same for each row). …

python apache-spark dataframe pyspark apache-spark-sql
How to define partitioning of DataFrame?

I've started using Spark SQL and DataFrames in Spark 1.4.0. I'm wanting to define a custom partitioner on DataFrames, in Scala, …

scala apache-spark dataframe apache-spark-sql partitioning
Convert date from String to Date format in Dataframes

I am trying to convert a column which is in String format to Date format using the to_date function …

apache-spark apache-spark-sql
Best way to get the max value in a Spark dataframe column

I'm trying to figure out the best way to get the largest value in a Spark dataframe column. Consider the …

python apache-spark pyspark apache-spark-sql