Can we consider AWS Glue as a replacement for EMR?

Yuva picture Yuva · Jan 12, 2018 · Viewed 17.5k times · Source

Just a quick question to clarify from Masters, since AWS Glue as an ETL tool, can provide companies with benefits such as, minimal or no server maintenance, cost savings by avoiding over-provisioning or under-provisioning resources, besides running on spark, I am looking for some clarifications, if AWS Glue can replace EMR?

If both can co-exist, how EMR can play a role along with AWS Glue?

Thanks & regards

Yuva

Answer

ctrl-c picture ctrl-c · Jan 13, 2018

As per my understanding, glue cannot be a replacement for EMR. It actually depends on your usecase. There are some limitations with glue ETL;

  • It does not support --packages.
  • You do not have an internal storage for storing temp data.

With glue catalog you can view data in Athena, but it also has few limitations like cannot create table as select, cannot create view etc. You can use Glue data catalog in EMR to overcome limitations of Athena.

So, currently glue can be a replacement for persistent metadata store.