How do I set up cloud-init on custom AMIs in AWS? (CentOS)

whereswalden picture whereswalden · May 1, 2014 · Viewed 49.9k times · Source

Defining userdata for instances in AWS seems really useful for doing all kinds of bootstrap-type actions. Unfortunately, I have to use a custom CentOS AMI that didn't originate from one of the provided AMIs for PCI reasons, so cloud-init is not already installed and configured. I only really want it to set a hostname and run a small bash script. How do I get it working?

Answer

whereswalden picture whereswalden · May 1, 2014

cloud-init is a very powerful, but very undocumented tool. Even once it's installed, there are lot of modules active by default that overwrite things you may have already defined on your AMI. Here are instructions for a minimal setup from scratch:

Instructions

  1. Install cloud-init from a standard repository. If you're worried about PCI, you probably don't want to use AWS's custom repositories.

    # rpm -Uvh https://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
    # yum install cloud-init
    
  2. Edit /etc/cloud/cloud.cfg, a yaml file, to reflect your desired configuration. Below is a minimal configuration with documentation for each module.

    #If this is not explicitly false, cloud-init will change things so that root
    #login via ssh is disabled. If you don't want it to do anything, set it false.
    disable_root: false
    
    #Set this if you want cloud-init to manage hostname. The current
    #/etc/hosts file will be replaced with the one in /etc/cloud/templates.
    manage_etc_hosts: true
    
    #Since cloud-init runs at multiple stages of boot, this needs to be set so
    #it can log in all of them to /var/log/cloud-init.
    syslog_fix_perms: null
    
    #This is the bit that makes userdata work. You need this to have userdata
    #scripts be run by cloud-init.
    datasource_list: [Ec2]
    datasource:
      Ec2:
        metadata_urls: ['http://169.254.169.254']
    
    #modules that run early in boot
    cloud_init_modules:
     - bootcmd  #for running commands in pre-boot. Commands can be defined in cloud-config userdata.
     - set-hostname  #These 3 make hostname setting work
     - update-hostname
     - update-etc-hosts
    
    #modules that run after boot
    cloud_config_modules:
     - runcmd  #like bootcmd, but runs after boot. Use this instead of bootcmd unless you have a good reason for doing so.
    
    #modules that run at some point after config is finished
    cloud_final_modules:
     - scripts-per-once  #all of these run scripts at specific events. Like bootcmd, can be defined in cloud-config.
     - scripts-per-boot
     - scripts-per-instance
     - scripts-user
     - phone-home  #if defined, can make a post request to a specified url when done booting
     - final-message  #if defined, can write a specified message to the log
     - power-state-change  #can trigger stuff based on power state changes
    
    system_info:
      #works because amazon's linux AMI is based on CentOS
      distro: amazon
    
  3. If there is a defaults.cfg in /etc/cloud/cloud.cfg.d/, delete it.

  4. To take advantage of this configuration, define the following userdata for new instances:

    #cloud-config
    hostname: myhostname
    fqdn: myhostname.mydomain.com
    runcmd:
     - echo "I did this thing post-boot"
     - echo "I did this too"
    

    You can also simply run a bash script by replacing #cloud-config with #!/bin/bash and putting the bash script in the body, but if you do, you should remove all of the hostname-related modules from cloud_init_modules.


Additional Notes

Note that this is a minimal configuration, and cloud-init is capable of managing users, ssh keys, mount points, etc. Look at the references below for more documentation on those specific features.

In general, it seems that cloud-init does stuff based on the modules specified. Some modules, like "disable-ec2-metadata", do stuff simply by being specified. Others, like "runcmd", only do stuff if their parameters are specified, either in cloud.cfg, or in cloud-config userdata. Most of the documentation below only tell you what parameters are possible for each module, not what the module is called, but the default cloud.cfg should have a complete module list to begin with. The best way I've found to disable a module is simply to remove it from the list.

In some cases, "rhel" may work better for the "distro" tag than "amazon". I haven't really figured out when.


References