Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.
I have a requirement to load data from an Hive table using Spark SQL HiveContext and load into HDFS. By …
apache-spark hive apache-spark-sql partitioningI was using json scala library to parse a json from a local drive in spark job : val requestJson=JSON.…
scala apache-spark apache-spark-sql apache-spark-2.0I am new to spark and was trying out a few commands in sparkSql using python when I came across …
apache-spark pyspark apache-spark-sql pyspark-sql sparkrI have a bizarre issue with PySpark when indexing column of strings in features. Here is my tmp.csv file: …
python apache-spark apache-spark-sql pyspark apache-spark-mlI want use StandardScaler to normalize the features. Here is my code: val Array(trainingData, testData) = dataset.randomSplit(Array(0.7,0.3)) val …
apache-spark apache-spark-sql apache-spark-mlI am receiving the following errors on starting the spark-sql shell. But when I start the shell using the command …
mysql apache-spark hive apache-spark-sql mysql-connectorLet's say I have a data structure like this where ts is some timestamp case class Record(ts: Long, id: …
scala apache-spark apache-spark-sql apache-spark-datasetMy understanding was that Spark is an alternative to Hadoop. However, when trying to install Spark, the installation page asks …
hadoop cassandra apache-spark apache-spark-sqlI have a compressed file with .gz format, Is it possible to read the file directly using spark DF/DS? …
apache-spark apache-spark-sql spark-dataframe gzip apache-spark-datasetThe problem arises when I call describe function on a DataFrame: val statsDF = myDataFrame.describe() Calling describe function yields the …
apache-spark apache-spark-sql apache-zeppelin