Top "Emr" questions

Amazon Elastic MapReduce (Amazon EMR) is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data.

How to handle changing parquet schema in Apache Spark

I have run into a problem where I have Parquet data as daily chunks in S3 (in the form of …

apache-spark apache-spark-sql spark-dataframe emr parquet
Boosting spark.yarn.executor.memoryOverhead

I'm trying to run a (py)Spark job on EMR that will process a large amount of data. Currently my …

amazon-web-services apache-spark pyspark emr amazon-emr
Spark + EMR using Amazon's "maximizeResourceAllocation" setting does not use all cores/vcores

I'm running an EMR cluster (version emr-4.2.0) for Spark using the Amazon specific maximizeResourceAllocation flag as documented here. According to …

apache-spark yarn emr amazon-emr elastic-map-reduce
Where are the Spark logs on EMR?

I'm not able to locate error logs or message's from println calls in Scala while running jobs on Spark in …

scala apache-spark emr
Spark - Which instance type is preferred for AWS EMR cluster?

I am running some machine learning algorithms on EMR Spark cluster. I am curious about which kind of instance to …

amazon-ec2 apache-spark emr
EMR Spark - TransportClient: Failed to send RPC

I'm getting this error, I tried to increase memory on cluster instances and in the executor and driver parameters without …

apache-spark yarn emr
Spark resources not fully allocated on Amazon EMR

I'm trying to maximize cluster usage for a simple task. Cluster is 1+2 x m3.xlarge, runnning Spark 1.3.1, Hadoop 2.4, Amazon AMI 3.7 …

apache-spark yarn emr
Any Scala SDK or interface for AWS?

Does anyone know of a Scala SDK for Amazon Web Services? I am particularly interested in the EMR jobs.

scala amazon-web-services emr amazon-emr
Spark on yarn mode end with "Exit status: -100. Diagnostics: Container released on a *lost* node"

I am trying to load a database with 1TB data to spark on AWS using the latest EMR. And the …

apache-spark yarn emr
How to set a custom environment variable in EMR to be available for a spark Application

I need to set a custom environment variable in EMR to be available when running a spark application. I have …

amazon-web-services hadoop apache-spark environment-variables emr