Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.
I'm beginner on Python and Spark. After creating a DataFrame from CSV file, I would like to know how I …
apache-spark pyspark apache-spark-sql trim pyspark-sqlI have a dataframe with schema as such: [visitorId: string, trackingIds: array<string>, emailIds: array<string>] …
scala apache-spark apache-spark-sqlI want to access the first 100 rows of a spark data frame and write the result back to a CSV …
apache-spark apache-spark-sql limitI am starting to use Spark DataFrames and I need to be able to pivot the data to create multiple …
scala apache-spark dataframe apache-spark-sql pivotI'm running spark locally and want to to access Hive tables, which are located in the remote Hadoop cluster. I'm …
apache-spark hive apache-spark-sql spark-thriftserverHow Can I query an RDD with complex types such as maps/arrays? for example, when I was writing this …
sql scala apache-spark dataframe apache-spark-sqlI'm trying to compare different ways to aggregate my data. This is my input data with 2 elements (page,visitor): (PAG1,…
count apache-spark distinct dataframe apache-spark-sqlI've been trying to find a reasonable way to test SparkSession with the JUnit testing framework. While there seem to …
scala unit-testing apache-spark junit apache-spark-sqlval items = List("a", "b", "c") sqlContext.sql("select c1 from table") .filter($"c1".isin(items)) .collect .foreach(println) The …
scala apache-spark apache-spark-sqlI need to convert my dataframe to a dataset and I used the following code: val final_df = Dataframe.withColumn( "…
scala apache-spark apache-spark-sql apache-spark-encoders