Apache Spark SQL is a tool for "SQL and structured data processing" on Spark, a fast and general-purpose cluster computing system.
Spark Version: spark-2.0.1-bin-hadoop2.7 Scala: 2.11.8 I am loading a raw csv into a DataFrame. In csv, although the column is …
scala apache-spark apache-spark-sql spark-dataframe spark-csvI'm perplexed between the behaviour of numPartitions parameter in the following methods: DataFrameReader.jdbc Dataset.repartition The official docs of …
apache-spark dataframe spark-dataframe spark-jdbcI need to read some JSON data from a web service thats providing REST interfaces to query the data from …
apache-spark-sql spark-dataframe azure-hdinsightThere are several similar-yet-different concepts in Spark-land surrounding how work gets farmed out to different nodes and executed concurrently. Specifically, …
apache-spark spark-dataframe distributed-computing partitioning bigdataI have had Spark job failing with a trace like this one: ./containers/application_1455622885057_0016/container_1455622885057_0016_01_000001/stderr-Container id: container_1455622885057_0016_01_000008 ./containers/application_1455622885057_0016/…
apache-spark yarn spark-dataframeI have a compressed file with .gz format, Is it possible to read the file directly using spark DF/DS? …
apache-spark apache-spark-sql spark-dataframe gzip apache-spark-datasetGiven 1 Billion records containing following information: ID x1 x2 x3 ... x100 1 0.1 0.12 1.3 ... -2.00 2 -1 1.2 2 ... 3 ... For each ID above, I want to …
apache-spark pyspark spark-dataframe nearest-neighbor euclidean-distanceI have been experimenting different ways to filter a typed data set. It turns out the performance can be quite …
apache-spark apache-spark-sql spark-dataframe apache-spark-datasetI am relatively new to Spark and Scala. I am starting with the following dataframe (single column made out of …
scala apache-spark rdd spark-dataframe apache-spark-mllibIm using Spark 2.0. I have a column of my dataframe containing a WrappedArray of WrappedArrays of Float. An example of …
arrays scala casting spark-dataframe apache-spark-2.0