Top "Apache-spark-sql" questions

Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.

How to perform union on two DataFrames with different amounts of columns in spark?

I have 2 DataFrames as followed : I need union like this: The unionAll function doesn't work because the number and the …

apache-spark pyspark apache-spark-sql
PySpark - rename more than one column using withColumnRenamed

I want to change names of two columns using spark withColumnRenamed function. Of course, I can write: data = sqlContext.createDataFrame([(1,2), (3,4)], […

apache-spark pyspark apache-spark-sql rename
get datatype of column using pyspark

We are reading data from MongoDB Collection. Collection column has two different values (e.g.: (bson.Int64,int) (int,float) ). …

apache-spark pyspark apache-spark-sql databricks
Get current number of partitions of a DataFrame

Is there any way to get the current number of partitions of a DataFrame? I checked the DataFrame javadoc (spark 1.6) …

apache-spark dataframe apache-spark-sql
Save Spark dataframe as dynamic partitioned table in Hive

I have a sample application working to read from csv files into a dataframe. The dataframe can be stored to …

hadoop apache-spark hive apache-spark-sql spark-dataframe
aggregate function Count usage with groupBy in Spark

I'm trying to make multiple operations in one line of code in pySpark, and not sure if that's possible for …

java scala apache-spark pyspark apache-spark-sql
pyspark dataframe filter or include based on list

I am trying to filter a dataframe in pyspark using a list. I want to either filter based on the …

apache-spark filter pyspark apache-spark-sql
How to construct Dataframe from a Excel (xls,xlsx) file in Scala Spark?

I have a large Excel(xlsx and xls) file with multiple sheet and I need convert it to RDD or …

excel scala apache-spark apache-spark-sql spark-excel
Create Spark DataFrame. Can not infer schema for type: <type 'float'>

Could someone help me solve this problem I have with Spark DataFrame? When I do myFloatRDD.toDF() I get an …

python apache-spark dataframe pyspark apache-spark-sql
Overwrite specific partitions in spark dataframe write method

I want to overwrite specific partitions instead of all in spark. I am trying the following command: df.write.orc(…

apache-spark apache-spark-sql spark-dataframe