Top "Amazon-emr" questions

Amazon Elastic MapReduce (Amazon EMR) is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data.

Spark 2.2.0 FileOutputCommitter

DirectFileOutputCommitter is no longer available in Spark 2.2.0. This means writing to S3 takes insanely long time (3 hours vs 2 mins). I'm …

hadoop apache-spark amazon-s3 apache-spark-sql amazon-emr
Optimizing GC on EMR cluster

I am running a Spark Job written in Scala on EMR and the stdout of each executor is filled with …

apache-spark garbage-collection jvm emr amazon-emr