Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.
I have 2 DataFrames as followed : I need union like this: The unionAll function doesn't work because the number and the …
apache-spark pyspark apache-spark-sqlI want to change names of two columns using spark withColumnRenamed function. Of course, I can write: data = sqlContext.createDataFrame([(1,2), (3,4)], […
apache-spark pyspark apache-spark-sql renameWe are reading data from MongoDB Collection. Collection column has two different values (e.g.: (bson.Int64,int) (int,float) ). …
apache-spark pyspark apache-spark-sql databricksIs there any way to get the current number of partitions of a DataFrame? I checked the DataFrame javadoc (spark 1.6) …
apache-spark dataframe apache-spark-sqlI have a sample application working to read from csv files into a dataframe. The dataframe can be stored to …
hadoop apache-spark hive apache-spark-sql spark-dataframeI'm trying to make multiple operations in one line of code in pySpark, and not sure if that's possible for …
java scala apache-spark pyspark apache-spark-sqlI am trying to filter a dataframe in pyspark using a list. I want to either filter based on the …
apache-spark filter pyspark apache-spark-sqlI have a large Excel(xlsx and xls) file with multiple sheet and I need convert it to RDD or …
excel scala apache-spark apache-spark-sql spark-excelCould someone help me solve this problem I have with Spark DataFrame? When I do myFloatRDD.toDF() I get an …
python apache-spark dataframe pyspark apache-spark-sqlI want to overwrite specific partitions instead of all in spark. I am trying the following command: df.write.orc(…
apache-spark apache-spark-sql spark-dataframe