Does an EMR master node know its cluster ID?

bstempi picture bstempi · Nov 26, 2013 · Viewed 11.8k times · Source

I want to be able to create EMR clusters, and for those clusters to send messages back to some central queue. In order for this to work, I need to have some sort of agent running on each master node. Each one of those agents will have to identify itself in this message so that the recipient knows which cluster the message is about.

Does the master node know its ID (j-*************)? If not, then is there some other piece of identifying information that could allow the message recipient to infer this ID?

I've taken a look through the config files in /home/hadoop/conf, and I haven't found anything useful. I found the ID in /mnt/var/log/instance-controller/instance-controller.log, but it looks like it'll be difficult to grep for. I'm wondering where instance-controller might get that ID from in the first place.

Answer

jc mannem picture jc mannem · Apr 9, 2015

You may look at /mnt/var/lib/info/ on Master node to find lot of info about your EMR cluster setup. More specifically /mnt/var/lib/info/job-flow.json contains the jobFlowId or ClusterID.

You can use the pre-installed json parser (jq) to get the jobflow id.

cat /mnt/var/lib/info/job-flow.json | jq -r ".jobFlowId"

(updated as per @Marboni)