Top "Apache-spark" questions

Apache Spark is an open source distributed data processing engine written in Scala providing a unified API and distributed data sets to users for both batch and streaming processing.

How to construct Dataframe from a Excel (xls,xlsx) file in Scala Spark?

I have a large Excel(xlsx and xls) file with multiple sheet and I need convert it to RDD or …

excel scala apache-spark apache-spark-sql spark-excel
PySpark: How to fillna values in dataframe for specific columns?

I have the following sample DataFrame: a | b | c | 1 | 2 | 4 | 0 | null | null| null | 3 | 4 | And I want to replace null values only …

apache-spark pyspark spark-dataframe
Create Spark DataFrame. Can not infer schema for type: <type 'float'>

Could someone help me solve this problem I have with Spark DataFrame? When I do myFloatRDD.toDF() I get an …

python apache-spark dataframe pyspark apache-spark-sql
Unable to infer schema when loading Parquet file

response = "mi_or_chd_5" outcome = sqlc.sql("""select eid,{response} as response from outcomes where {response} IS NOT NULL""".format(…

apache-spark pyspark parquet
Where are logs in Spark on YARN?

I'm new to spark. Now I can run spark 0.9.1 on yarn (2.0.0-cdh4.2.1). But there is no log after execution. The …

hadoop logging apache-spark cloudera yarn
Spark union of multiple RDDs

In my pig code I do this: all_combined = Union relation1, relation2, relation3, relation4, relation5, relation 6. I want to do …

python apache-spark pyspark rdd
Overwrite specific partitions in spark dataframe write method

I want to overwrite specific partitions instead of all in spark. I am trying the following command: df.write.orc(…

apache-spark apache-spark-sql spark-dataframe
How does Distinct() function work in Spark?

I'm a newbie to Apache Spark and was learning basic functionalities. Had a small doubt.Suppose I have an RDD …

apache-spark distinct
Cannot Read a file from HDFS using Spark

I have installed cloudera CDH 5 by using cloudera manager. I can easily do hadoop fs -ls /input/war-and-peace.txt hadoop …

hadoop apache-spark cloudera-cdh
How to pass -D parameter or environment variable to Spark job?

I want to change Typesafe config of a Spark job in dev/prod environment. It seems to me that the …

scala apache-spark