Top "Apache-spark-sql" questions

Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.

How to control partition size in Spark SQL

I have a requirement to load data from an Hive table using Spark SQL HiveContext and load into HDFS. By …

apache-spark hive apache-spark-sql partitioning
Parsing json in spark

I was using json scala library to parse a json from a local drive in spark job : val requestJson=JSON.…

scala apache-spark apache-spark-sql apache-spark-2.0
Difference between createOrReplaceTempView and registerTempTable

I am new to spark and was trying out a few commands in sparkSql using python when I came across …

apache-spark pyspark apache-spark-sql pyspark-sql sparkr
Apache Spark throws NullPointerException when encountering missing feature

I have a bizarre issue with PySpark when indexing column of strings in features. Here is my tmp.csv file: …

python apache-spark apache-spark-sql pyspark apache-spark-ml
SparkException: Values to assemble cannot be null

I want use StandardScaler to normalize the features. Here is my code: val Array(trainingData, testData) = dataset.randomSplit(Array(0.7,0.3)) val …

apache-spark apache-spark-sql apache-spark-ml
com.mysql.jdbc.Driver not found on classpath while starting spark sql and thrift server

I am receiving the following errors on starting the spark-sql shell. But when I start the shell using the command …

mysql apache-spark hive apache-spark-sql mysql-connector
Spark Dataframes- Reducing By Key

Let's say I have a data structure like this where ts is some timestamp case class Record(ts: Long, id: …

scala apache-spark apache-spark-sql apache-spark-dataset
What is the relationship between Spark, Hadoop and Cassandra

My understanding was that Spark is an alternative to Hadoop. However, when trying to install Spark, the installation page asks …

hadoop cassandra apache-spark apache-spark-sql
Ho to read ".gz" compressed file using spark DF or DS?

I have a compressed file with .gz format, Is it possible to read the file directly using spark DF/DS? …

apache-spark apache-spark-sql spark-dataframe gzip apache-spark-dataset
Spark 1.6: filtering DataFrames generated by describe()

The problem arises when I call describe function on a DataFrame: val statsDF = myDataFrame.describe() Calling describe function yields the …

apache-spark apache-spark-sql apache-zeppelin