Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.
>>> a DataFrame[id: bigint, julian_date: string, user_id: bigint] >>> b DataFrame[id: bigint, …
apache-spark apache-spark-sql pysparkI have a date pyspark dataframe with a string column in the format of MM-dd-yyyy and I am attempting to …
apache-spark pyspark apache-spark-sql pyspark-sqlI want to convert a string column of a data frame to a list. What I can find from the …
scala apache-spark apache-spark-sqlI am trying to convert all the headers / column names of a DataFrame in Spark-Scala. as of now I come …
scala apache-spark dataframe apache-spark-sqlI created a dataframe in spark with the following schema: root |-- user_id: long (nullable = false) |-- event_id: …
scala apache-spark apache-spark-sql spark-dataframeIs it possible to save DataFrame in spark directly to Hive? I have tried with converting DataFrame to Rdd and …
scala apache-spark hive apache-spark-sqlI want to add a column in a DataFrame with some arbitrary value (that is the same for each row). …
python apache-spark dataframe pyspark apache-spark-sqlI've started using Spark SQL and DataFrames in Spark 1.4.0. I'm wanting to define a custom partitioner on DataFrames, in Scala, …
scala apache-spark dataframe apache-spark-sql partitioningI am trying to convert a column which is in String format to Date format using the to_date function …
apache-spark apache-spark-sqlI'm trying to figure out the best way to get the largest value in a Spark dataframe column. Consider the …
python apache-spark pyspark apache-spark-sql