Top "Amazon-emr" questions

Amazon Elastic MapReduce (Amazon EMR) is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data.

How do you make a HIVE table out of JSON data?

I want to create a Hive table out of some JSON data (nested) and run queries on it? Is this …

json hadoop hive amazon-emr emr
"Container killed by YARN for exceeding memory limits. 10.4 GB of 10.4 GB physical memory used" on an EMR cluster with 75GB of memory

I'm running a 5 node Spark cluster on AWS EMR each sized m3.xlarge (1 master 4 slaves). I successfully ran through a 146…

apache-spark emr amazon-emr bigdata
Application report for application_ (state: ACCEPTED) never ends for Spark Submit (with Spark 1.2.0 on YARN)

I am running kinesis plus spark application https://spark.apache.org/docs/1.2.0/streaming-kinesis-integration.html I am running as below command …

apache-spark yarn amazon-emr amazon-kinesis
Does Hive have something equivalent to DUAL?

I'd like to run statements like SELECT date_add('2008-12-31', 1) FROM DUAL Does Hive (running on Amazon …

hadoop hive amazon-emr
Pyspark - Load file: Path does not exist

I am a newbie to Spark. I'm trying to read a local csv file within an EMR cluster. The file …

apache-spark pyspark emr amazon-emr pyspark-sql
Amazon EC2 vs. Amazon EMR

I have implemented a task in Hive. Currently it is working fine on my single node cluster. Now I am …

amazon-ec2 amazon-web-services hive amazon-emr
How to select a file from aws s3 by using wild character

I have many a files in s3 bucket and I want to copy those files which have start date of 2012. …

amazon-web-services amazon-s3 amazon-emr
Boosting spark.yarn.executor.memoryOverhead

I'm trying to run a (py)Spark job on EMR that will process a large amount of data. Currently my …

amazon-web-services apache-spark pyspark emr amazon-emr
Extremely slow S3 write times from EMR/ Spark

I'm writing to see if anyone knows how to speed up S3 write times from Spark running in EMR? My …

amazon-web-services apache-spark amazon-s3 amazon-emr