Top "Apache-spark-sql" questions

Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.

How to create SparkSession with Hive support (fails with "Hive classes are not found")?

I'm getting an error while trying to run the following code: import org.apache.spark.sql.Dataset; import org.apache.…

java apache-spark hive apache-spark-sql
Including null values in an Apache Spark Join

I would like to include null values in an Apache Spark join. Spark doesn't include rows with null by default. …

sql scala apache-spark join apache-spark-sql
Convert null values to empty array in Spark DataFrame

I have a Spark data frame where one column is an array of integers. The column is nullable because it …

apache-spark dataframe apache-spark-sql apache-spark-1.5
Spark DataFrame: does groupBy after orderBy maintain that order?

I have a Spark 2.0 dataframe example with the following structure: id, hour, count id1, 0, 12 id1, 1, 55 .. id1, 23, 44 id2, 0, 12 id2, 1, 89 .. id2, 23, 34 etc. …

scala apache-spark apache-spark-sql spark-streaming spark-dataframe
Reading Avro File in Spark

I have read an avro file into spark RDD and need to conver that into a sql dataframe. how do …

scala apache-spark apache-spark-sql apache-zeppelin
Should we parallelize a DataFrame like we parallelize a Seq before training

Consider the code given here, https://spark.apache.org/docs/1.2.0/ml-guide.html import org.apache.spark.ml.classification.LogisticRegression val …

scala apache-spark pyspark apache-spark-sql apache-spark-ml
How to create a udf in PySpark which returns an array of strings?

I have a udf which returns a list of strings. this should not be too hard. I pass in the …

python apache-spark pyspark apache-spark-sql user-defined-functions
How to split Vector into columns - using PySpark

Context: I have a DataFrame with 2 columns: word and vector. Where the column type of "vector" is VectorUDT. An Example: …

python apache-spark pyspark apache-spark-sql apache-spark-ml
How to connect HBase and Spark using Python?

I have an embarrassingly parallel task for which I use Spark to distribute the computations. These computations are in Python, …

python apache-spark hbase pyspark apache-spark-sql
How do I call a UDF on a Spark DataFrame using JAVA?

Similar question as here, but don't have enough points to comment there. According to the latest Spark documentation an udf …

java apache-spark apache-spark-sql user-defined-functions