Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.
I work on a dataframe with two column, mvv and count. +---+-----+ |mvv|count| +---+-----+ | 1 | 5 | | 2 | 9 | | 3 | 3 | | 4 | 1 | i would like …
python apache-spark pyspark spark-dataframeI created a dataframe in spark with the following schema: root |-- user_id: long (nullable = false) |-- event_id: …
scala apache-spark apache-spark-sql spark-dataframeI'm trying to convert Pandas DF into Spark one. DF head: 10000001,1,0,1,12:35,OK,10002,1,0,9,f,NA,24,24,0,3,9,0,0,1,1,0,0,4,543 10000001,2,0,1,12:36,OK,10002,1,0,9,f,NA,24,24,0,3,9,2,1,1,3,1,3,2,611 10000002,1,0,4,12:19,PA,10003,1,1,7,f,NA,74,74,0,2,15,2,0,2,3,1,2,2,691 …
python pandas apache-spark spark-dataframeLooking at the new spark dataframe api, it is unclear whether it is possible to modify dataframe columns. How would …
python apache-spark pyspark apache-spark-sql spark-dataframeI've seen various people suggesting that Dataframe.explode is a useful way to do this, but it results in more …
apache-spark pyspark apache-spark-sql spark-dataframe pyspark-sqlI am using pyspark to read a parquet file like below: my_df = sqlContext.read.parquet('hdfs://myPath/myDB.db/…
python pandas pyspark spark-dataframeI am new to Spark and Spark SQL. How does createOrReplaceTempView work in Spark? If we register an RDD of …
apache-spark apache-spark-sql spark-dataframeI am trying to convert the Spark RDD to a DataFrame. I have seen the documentation and example where the …
python apache-spark pyspark spark-dataframeI have this code: l = [('Alice', 1),('Jim',2),('Sandra',3)] df = sqlContext.createDataFrame(l, ['name', 'age']) df.withColumn('age2', df.age + 2).…
python apache-spark-sql spark-dataframeConsider I have a defined schema for loading 10 csv files in a folder. Is there a way to automatically load …
apache-spark apache-spark-sql spark-dataframe