Top "Apache-spark-1.6" questions

Use for questions specific to Apache Spark 1.6. For general questions related to Apache Spark use the tag [apache-spark].

How to use collect_set and collect_list functions in windowed aggregation in Spark 1.6?

In Spark 1.6.0 / Scala, is there an opportunity to get collect_list("colC") or collect_set("colC").over(Window.partitionBy("colA").…

scala apache-spark apache-spark-sql apache-spark-1.6
How to enable or disable Hive support in spark-shell through Spark property (Spark 1.6)?

Is there any configuration property we can set it to disable / enable Hive support through spark-shell explicitly in spark 1.6. I …

apache-spark hive apache-spark-sql apache-spark-1.6
Reading CSV into a Spark Dataframe with timestamp and date types

It's CDH with Spark 1.6. I am trying to import this Hypothetical CSV into a apache Spark DataFrame: $ hadoop fs -cat …

apache-spark apache-spark-sql apache-spark-1.6
Get first non-null values in group by (Spark 1.6)

How can I get the first non-null values from a group by? I tried using first with coalesce F.first(…

apache-spark pyspark spark-dataframe apache-spark-1.6
Where is the reference for options for writing or reading per format?

I use Spark 1.6.1. We are trying to write an ORC file to HDFS using HiveContext and DataFrameWriter. While we can …

apache-spark apache-spark-sql apache-spark-1.6
Why Spark application on YARN fails with FetchFailedException due to Connection refused?

I am using spark version 1.6.3 and yarn version 2.7.1.2.3 comes with HDP-2.3.0.0-2557. Becuase, spark version is too old in the …

apache-spark hadoop-yarn apache-spark-1.6
What to do with "WARN TaskSetManager: Stage contains a task of very large size"?

I use spark 1.6.1. My spark application reads more than 10000 parquet files stored in s3. val df = sqlContext.read.option("mergeSchema", "…

apache-spark apache-spark-1.6
PySpark serialization EOFError

I am reading in a CSV as a Spark DataFrame and performing machine learning operations upon it. I keep getting …

python apache-spark pyspark apache-spark-1.6
How to replace NULL to 0 in left outer join in SPARK dataframe v1.6

I am working Spark v1.6. I have the following two DataFrames and I want to convert the null to 0 in …

scala apache-spark apache-spark-sql apache-spark-1.6
Removing NULL , NAN, empty space from PySpark DataFrame

I have a dataframe in PySpark which contains empty space, Null, and Nan. I want to remove rows which have …

apache-spark pyspark apache-spark-1.6