Top "Amazon-emr" questions

Amazon Elastic MapReduce (Amazon EMR) is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data.

How do you automate pyspark jobs on emr using boto3 (or otherwise)?

I am creating a job to parse massive amounts of server data, and then upload it into a Redshift database. …

python amazon-s3 apache-spark pyspark amazon-emr
Saving dataframe to local file system results in empty results

We are running spark 2.3.0 on AWS EMR. The following DataFrame "df" is non empty and of modest size: scala> …

apache-spark amazon-emr
Emrfs file sync with s3 not working

After running a spark job on an Amazon EMR cluster, I deleted the output files directly from s3 and tried …

amazon-s3 pyspark amazon-emr
HIVE External Table - Set Empty Strings to NULL

Currently I have a HIVE 0.7 instance on Amazon EMR. I am trying to create a duplicate of this instance on …

hadoop hive hdfs amazon-emr external-tables
Python pip install pyarrow error, unable to execute 'cmake'

I'm trying to install the pyarrow on a master instance of my EMR cluster, however I'm always receiving this error. […

python-3.x cmake pip amazon-emr pyarrow
Dealing with a large gzipped file in Spark

I have a large (about 85 GB compressed) gzipped file from s3 that I am trying to process with Spark on …

apache-spark gzip amazon-emr
FAILED: ParseException: cannot recognize input near 'exchange' 'string' ',' in column specification

I am using latest AWS Hive version 0.13.0. FAILED: ParseException: cannot recognize input near 'exchange' 'string' ',' in column specification …

hadoop amazon-web-services hive amazon-emr hadoop-partitioning
Any Scala SDK or interface for AWS?

Does anyone know of a Scala SDK for Amazon Web Services? I am particularly interested in the EMR jobs.

scala amazon-web-services emr amazon-emr
AWS CLI - No JSON object could be decoded

Im using the CLI for AWS to create a cluster and use the parameters from a json file. Here is …

amazon-web-services aws-cli amazon-emr
Does an EMR master node know its cluster ID?

I want to be able to create EMR clusters, and for those clusters to send messages back to some central …

amazon-web-services hadoop amazon-emr