Immutable Infrastructure with AWS and Ansible – Part 3 – Autoscaling


In part 1 and part 2 of this series, we setup our workstation so that we were capable of provisioning infrastructure with AWS and Ansible, and then created a simple play that provisioned an Ubuntu workstation in EC2.  That’s a great first step, but it’s not immutable.  The workstation we provisioned in EC2 is just like any other.  It’s not resilient to failure, and if it gets terminated for any reason, it will no longer be running.  In this part, we’re going to add to our Deploy Workstation play and create an AMI (Amazon Machine Image) out of our workstation after it’s configured, then create a launch configuration pointing at that AMI, then create an autoscaling group that points at the launch configuration.

Even if we’re just running a single instance, autoscaling groups are beneficial because they indicate the desired state of your system.  You can tell it you want a minimum of 1 instance and a maximum of 1 instance, and AWS will ensure you always have 1 running.  This means your instance will be resilient to all types of failures.  It will be restarted if it gets terminated for any reason, and if you need to do upgrades, AWS is smart enough to do them in a rolling manner, so you always have at least 1 healthy running instance.

Update:  The source code for this playbook is on Github here:

The Create AMI Role

The first new role we are going to add to our playbook will allow us to create an AMI by snapshotting our running instance.  We do this after our previous workstation role has fully configured the instance, so we are capturing the golden master state that we want to preserve immutably.  Create new folders under your playbook folder called roles/create-ami/tasks, and place the following main.yml file in it:

The Create Launch Config Role

The second new role we are going to add to our playbook will allow us to create a launch configuration that points at the AMI we just created.  This is a necessary next step before we can create the autoscaling group.

Create new folders under your playbook directory called roles/create-launch-config/tasks, and create a file in that folder called main.yml, with the following content in it:

Update the PowerUsers IAM Group

Before we can successfully create the launch configuration, our PowerUsers group needs permission to perform the IAM:PassRole for our “noaccess” policy.  If we don’t have permission to do this, creating the launch configuration will fail.  Go into the Identity and Access Management screen from the AWS console, and click on Groups on the left-hand side.  Select the PowerUsers group we created in step 1, then click the Permissions tab, the click Attach Policy.  Type “iam” in the filter box, and check the box next to IAMFullAccess:
Screen Shot 2016-01-25 at 4.04.46 AM

Your IAM group’s policies should look like this after you’re done:
Screen Shot 2016-01-25 at 4.04.56 AM

The Autoscaling Role

The third new role we are going to create is the role that actually creates the autoscaling group.  This role first checks to see if an autoscaling group with the same name already exists, and if so, it just updates it.  By updating the autoscaling group to point at the new launch configuration, with a new AMI, the autoscaling group will automatically do a rolling upgrade, where it starts a new instance, waits until the OS is loaded and healthy, then terminates an old instance.  It repeats this process until all instances in the autoscaling group are running the new AMI.  You can configure this by changing replace_batch_size, however, we’ve set a sensible default based on the size of the group divided by 4. For example, if you had an autoscaling group with 8 running instances, autoscaling would deploy 2 new instances at once, to speed up the rolling upgrade process.

If it’s creating a new autoscaling group, it also sets some CloudWatch metric alarms based on CPU utilization, and links the metric alarms to the scaling policies.  The way we set these alarms, if average CPU utilization is greater than 50% for 5 minutes, the group will scale up by adding another instance.  If average CPU utilization is less than 20% for 5 minutes, the group will scale down by terminating an instance.  There are also some cooldown times set so that this doesn’t happen too often; scaling up can only happen every 3 minutes, and scaling down can only happen every 5 minutes.

Create new folders roles/auto-scaling/tasks under your playbook folder, and create a file named main.yml in this folder, with the following content in it:

Cleaning up after ourselves

We’re also going to add three more new roles that are designed to purge all but the last 5 AMIs and launch configurations, as well as terminate our amibuild instance (the instance that we just used to configure our golden AMI).  Keeping the last 5 AMIs and launch configurations around is extremely useful, in the event that  you deploy a breaking change to your infrastructure, you can simply point your autoscaling group at the most recently working launch configuration, and your application will be back up and running rapidly.  You can easily configure this to keep more than the 5 most recent launch configurations and AMIs if you like.

The Delete Old Launch Configurations Role

Create a new folder called roles/delete-old-launch-configurations/tasks and create a file named main.yml in it, with the following content:

This role also requires a python script called be placed in it’s library.  Create a folder called roles/delete-old-launch-configurations/library and create a file named in it, with the following content:

The Delete Old AMIs Role

This role simply deletes any AMIs other than the 5 most recently created ones for the particular autoscaling group we are deploying.  Create a folder called roles/delete-old-amis/tasks, and create a file named main.yml in it, with the following content:

The Terminate Role

This role is very simple.  Now that we’ve captured the AMI snapshot of our fully configured system, created a launch config, and created an autoscaling group based on it, we no longer need our temporary amibuild system.  This role will terminate it.

Create new folders named roles/terminate/tasks under your playbook folder, and create a file named main.yml in it, with the following content:

Putting it all together

In order to put all of the new roles we’ve created together, we need to update our deployworkstation.yml play located in the root of our playbook folder.  The new deployworkstation.yml play should have the following content in it:

Execute your play by typing the following command:

ansible-playbook -vv -e group_name=test deployworkstation.yml

After your playbook has run, you should see output like the following if everything was successful:
Screen Shot 2016-01-25 at 4.30.18 AM


Congratulations!  You’ve now successfully provisioned an immutable autoscaling group in Amazon Web Services!  If you run the playbook again, it will create a new AMI, and perform a rolling deploy/upgrade to the new image.  One of the beautiful things about immutable infrastructure is that, when you need to patch or upgrade your system, you don’t have to touch the existing server – you simply run the automation that created it, and you get a brand new immutable image, updated to the latest security patches and versions.

In future articles, we’ll continue to expand our playbook with more functionality beyond simply provisioning immutable workstations in AWS.

11 Responses

  1. Sid March 30, 2016 / 12:01 am

    I also use ansible to replace instances in the ASG in a rolling deployment fashion. I notice it fails when the ASG size is 0.

    • VCDXpert March 30, 2016 / 11:28 am

      I’m not sure how the ASG size could be 0. Typically, you have a min_instances and max_instances, which can both be 1, or some number larger than 1.

  2. VM May 15, 2016 / 12:55 am

    Thank you for sharing!

  3. fiko November 29, 2016 / 3:07 am

    Hello there,

    thanks for this article. This statement is NOT valid for my AWS environment. What would be the problem?

    “By updating the autoscaling group to point at the new launch configuration, with a new AMI, the autoscaling group will automatically do a rolling upgrade, where it starts a new instance, waits until the OS is loaded and healthy, then terminates an old instance.”

    • VCDXpert November 29, 2016 / 10:45 am

      Can you please be more specific about what is going wrong? Did you receive an error message during the playbook execution? Please post the error message and I’ll try to help out.

      • fiko December 14, 2016 / 4:08 pm

        Nope, there is no error or anything. Everything works properly but Autoscaling group does not do what you say by default. I tried it manually as well.

        – I create new launch configuration
        – Update my current AutoScaling group to use new launch config
        – but it does not do rolling update.

        • fiko December 14, 2016 / 4:30 pm

          oh sorry, problem was related to my code deployment methodology. It was suspending the auto-scaling group during deployment.
          Thank you very much.

          • VCDXpert December 15, 2016 / 4:42 pm

            Glad you were able to figure it out.

  4. Gesias April 5, 2017 / 8:19 am

    Hi and thanks for a great post! Using it now in our environment and I have it 90% figured out. One thing though, the playbook fails when I create alarm metrics and coupling them to auto scaling policys.

    This is the dump, will be looking at it now and see if I can figure it out but was gonna check in and see if this is something you have seen before?

    The field ‘args’ has an invalid value, which appears to include a variable that is undefined. The error was: ‘ansible.vars.unsafe_proxy.AnsibleUnsafeText object’ has no attribute ‘comparison’

    The error appears to have been in ‘/Users/someuser/PycharmProjects/clooset/project/devops/autoscale/roles/auto-scaling/tasks/main.yml’: line 78, column 3, but may be elsewhere in the file depending on the exact syntax problem.

    The offending line appears to be:
    – name: Configure Metric Alarms and link to Scaling Policies\n ^ here\n

    • Gesias April 5, 2017 / 8:23 am

      I did do a variable dump just before to check what is fed to the function and got this which looks like it is legit.

      ok: [localhost] => {
      “alarm_metrics”: [
      “alarm_actions”: [
      “arn:aws:autoscaling:sa-east-1:X:scalingPolicy:X-4cf9-b48d-X:autoScalingGroupName/project:policyName/Increase Group Size”
      “comparison”: “>=”,
      “name”: “project-ScaleUp”,
      “threshold”: 50.0
      “alarm_actions”: [
      “arn:aws:autoscaling:sa-east-1:X:scalingPolicy:X-462a-89a2-X:autoScalingGroupName/project:policyName/Decrease Group Size”
      “comparison”: “<=",
      "name": "project-ScaleDown",
      "threshold": 20.0

      • Gesias April 5, 2017 / 2:37 pm

        I solved it by duplicating the tasks and not doing the “with_items” routine. It works now. Maybe it has something to do with the ansible version? I am running 2.2.1. Sorry about the spamming.

Leave a Reply

Your email address will not be published. Required fields are marked *