Immutable Infrastructure with AWS and Ansible – Part 3 – Autoscaling

Introduction

In part 1 and part 2 of this series, we setup our workstation so that we were capable of provisioning infrastructure with AWS and Ansible, and then created a simple play that provisioned an Ubuntu workstation in EC2.  That’s a great first step, but it’s not immutable.  The workstation we provisioned in EC2 is just like any other.  It’s not resilient to failure, and if it gets terminated for any reason, it will no longer be running.  In this part, we’re going to add to our Deploy Workstation play and create an AMI (Amazon Machine Image) out of our workstation after it’s configured, then create a launch configuration pointing at that AMI, then create an autoscaling group that points at the launch configuration.

Even if we’re just running a single instance, autoscaling groups are beneficial because they indicate the desired state of your system.  You can tell it you want a minimum of 1 instance and a maximum of 1 instance, and AWS will ensure you always have 1 running.  This means your instance will be resilient to all types of failures.  It will be restarted if it gets terminated for any reason, and if you need to do upgrades, AWS is smart enough to do them in a rolling manner, so you always have at least 1 healthy running instance.

Update:  The source code for this playbook is on Github here:  https://github.com/lyoungblood/immutable

The Create AMI Role

The first new role we are going to add to our playbook will allow us to create an AMI by snapshotting our running instance.  We do this after our previous workstation role has fully configured the instance, so we are capturing the golden master state that we want to preserve immutably.  Create new folders under your playbook folder called roles/create-ami/tasks, and place the following main.yml file in it:

The Create Launch Config Role

The second new role we are going to add to our playbook will allow us to create a launch configuration that points at the AMI we just created.  This is a necessary next step before we can create the autoscaling group.

Create new folders under your playbook directory called roles/create-launch-config/tasks, and create a file in that folder called main.yml, with the following content in it:

Update the PowerUsers IAM Group

Before we can successfully create the launch configuration, our PowerUsers group needs permission to perform the IAM:PassRole for our “noaccess” policy.  If we don’t have permission to do this, creating the launch configuration will fail.  Go into the Identity and Access Management screen from the AWS console, and click on Groups on the left-hand side.  Select the PowerUsers group we created in step 1, then click the Permissions tab, the click Attach Policy.  Type “iam” in the filter box, and check the box next to IAMFullAccess:
Screen Shot 2016-01-25 at 4.04.46 AM

Your IAM group’s policies should look like this after you’re done:
Screen Shot 2016-01-25 at 4.04.56 AM

The Autoscaling Role

The third new role we are going to create is the role that actually creates the autoscaling group.  This role first checks to see if an autoscaling group with the same name already exists, and if so, it just updates it.  By updating the autoscaling group to point at the new launch configuration, with a new AMI, the autoscaling group will automatically do a rolling upgrade, where it starts a new instance, waits until the OS is loaded and healthy, then terminates an old instance.  It repeats this process until all instances in the autoscaling group are running the new AMI.  You can configure this by changing replace_batch_size, however, we’ve set a sensible default based on the size of the group divided by 4. For example, if you had an autoscaling group with 8 running instances, autoscaling would deploy 2 new instances at once, to speed up the rolling upgrade process.

If it’s creating a new autoscaling group, it also sets some CloudWatch metric alarms based on CPU utilization, and links the metric alarms to the scaling policies.  The way we set these alarms, if average CPU utilization is greater than 50% for 5 minutes, the group will scale up by adding another instance.  If average CPU utilization is less than 20% for 5 minutes, the group will scale down by terminating an instance.  There are also some cooldown times set so that this doesn’t happen too often; scaling up can only happen every 3 minutes, and scaling down can only happen every 5 minutes.

Create new folders roles/auto-scaling/tasks under your playbook folder, and create a file named main.yml in this folder, with the following content in it:

Cleaning up after ourselves

We’re also going to add three more new roles that are designed to purge all but the last 5 AMIs and launch configurations, as well as terminate our amibuild instance (the instance that we just used to configure our golden AMI).  Keeping the last 5 AMIs and launch configurations around is extremely useful, in the event that  you deploy a breaking change to your infrastructure, you can simply point your autoscaling group at the most recently working launch configuration, and your application will be back up and running rapidly.  You can easily configure this to keep more than the 5 most recent launch configurations and AMIs if you like.

The Delete Old Launch Configurations Role

Create a new folder called roles/delete-old-launch-configurations/tasks and create a file named main.yml in it, with the following content:

This role also requires a python script called lc_find.py be placed in it’s library.  Create a folder called roles/delete-old-launch-configurations/library and create a file named lc_find.py in it, with the following content:

The Delete Old AMIs Role

This role simply deletes any AMIs other than the 5 most recently created ones for the particular autoscaling group we are deploying.  Create a folder called roles/delete-old-amis/tasks, and create a file named main.yml in it, with the following content:

The Terminate Role

This role is very simple.  Now that we’ve captured the AMI snapshot of our fully configured system, created a launch config, and created an autoscaling group based on it, we no longer need our temporary amibuild system.  This role will terminate it.

Create new folders named roles/terminate/tasks under your playbook folder, and create a file named main.yml in it, with the following content:

Putting it all together

In order to put all of the new roles we’ve created together, we need to update our deployworkstation.yml play located in the root of our playbook folder.  The new deployworkstation.yml play should have the following content in it:

Execute your play by typing the following command:

ansible-playbook -vv -e group_name=test deployworkstation.yml

After your playbook has run, you should see output like the following if everything was successful:
Screen Shot 2016-01-25 at 4.30.18 AM

Conclusion

Congratulations!  You’ve now successfully provisioned an immutable autoscaling group in Amazon Web Services!  If you run the playbook again, it will create a new AMI, and perform a rolling deploy/upgrade to the new image.  One of the beautiful things about immutable infrastructure is that, when you need to patch or upgrade your system, you don’t have to touch the existing server – you simply run the automation that created it, and you get a brand new immutable image, updated to the latest security patches and versions.

In future articles, we’ll continue to expand our playbook with more functionality beyond simply provisioning immutable workstations in AWS.

Immutable Infrastructure with AWS and Ansible – Part 2 – Workstation

Introduction

In the first part of this series, we setup our workstation so that it could communicate with the Amazon Web Services APIs, and setup our AWS account so that it was ready to provision EC2 compute infrastructure.  In this section, we’ll start building our Ansible playbook that will provision immutable infrastructure.

Update:  The source code for this playbook is on Github here:  https://github.com/lyoungblood/immutable

Ansible Dynamic Inventory

When working with cloud resources, Ansible has the capability of using a dynamic inventory system to find and configure all of your instances within AWS, or any other cloud provider.  In order for this to work properly, we need to setup the EC2 external inventory script in our playbook.

  1. First, create the playbook folder (I named mine ~/immutable) and the inventory folder within it:
    mkdir -p ~/immutable/inventory;cd ~/immutable/inventory
  2. Next, download the EC2 external inventory script from Ansible:
    wget https://raw.github.com/ansible/ansible/devel/contrib/inventory/ec2.py
  3. Make the script executable by typing:
    chmod +x ec2.py
  4. Configure the EC2 external inventory script by creating a new file called ec2.ini in the inventory folder alongside the ec2.py script.  If you specify the region you are working with, you will significantly decrease the execution time because the script will not need to scan every EC2 region for instances.  Here is a copy of my ec2.ini file, configured to use the us-east-1 region:
  5. Next, create the default Ansible configuration for this playbook by editing a file in the root of the playbook directory (in our example, ~/immutable), named ansible.cfg. This file should have the following text in it:

To test and ensure that this script is working properly (and that your boto credentials are setup properly), execute the following command:
./ec2.py --list
You should see the following output:
Screen Shot 2016-01-22 at 12.06.19 PM

Ansible Roles

Roles are a way to automatically load certain variables and tasks into an Ansible playbook, and allow you to reuse your tasks in a modular way.  We will heavily (ab)use roles to make the tasks in our playbook reusable, since many of our infrastructure provisioning operations will use the same tasks repeatedly.

Group Variables

The following group variables will apply to any tasks configured in our playbook, unless we override them at the task level.  This will allow us to specify a set of sensible defaults that will work for most provisioning use cases, but still have flexibility to change them when we need it.  Start by creating a folder for your playbook (I called it immutable, but you can call it whatever you like), then create a group_vars folder underneath it:
cd ~immutable; mkdir group_vars
Now, we can edit the file all.yml in that folder we just created, group_vars, to contain the following text. Please note that indentation is important in YAML syntax:

Now that our group_vars are setup, we can move onto creating our first role.

The Launch Role

The launch role performs an important first step – it first searches for the latest Ubuntu 14.04 LTS (long term support) AMI (Amazon Machine Image) that is published by Canonical, the creators of Ubuntu, then launches a new EC2 compute instance, in the region and availability zone specified in our group_vars file.  Note that the launch role also launches a very small compute instance (t2.micro) because this instance will only live for a short time while it is configured by subsequent tasks, then baked into a golden master AMI snapshot that lives in S3 object storage.

A quick note about Availability Zones: if you comment out the zone variable in our group_vars file, your instances will be launched in a random zone within the region specified.  This can be useful if you want to ensure that an outage in a single AZ doesn’t take down every instance in your auto-scaling group, but there is a trade-off as data transfer between zones incurs a charge, so if your database, for example, is in another zone, you’ll pay a small network bandwidth fee to access it.

Create a new folder under your playbook directory called roles, and create a launch folder within it, then create a tasks folder under that, then edit a file called main.yml in this tasks folder:
mkdir -p roles/launch/tasks
Now, put the following contents in the main.yml file:

You’ll notice that this launch role also waits for the instance to boot by waiting for port 22 (ssh) to be available on the host. This is useful because subsequent tasks will use an ssh connection to configure the system, so we want to ensure the system is completely booted before we proceed.

The Workstation Role

Now that we have a role that can launch a brand new t2.micro instance, our next role will allow us to configure this instance to be used as a workstation.  This workstation configuration will be fairly simplistic, however, you can easily customize it as much as you want later.  This is mainly just to illustrate how you would configure the golden image.

There are two directories we need to create for this role, the tasks directory, as well as the files directory, as there is an init script we want to populate onto the workstation that will create a swapfile on first boot:
mkdir -p roles/workstation/tasks;mkdir roles/workstation/files
Next, we’ll create the task:

Initializing a swap file automatically

When you provision the Ubuntu 14.04 LTS instance, it won’t have a swapfile by default.  This is a bit risky because if you run out of memory your system could become unstable.  This file should be placed in roles/workstation/files/aws-swap-init, where the task above will copy it to your workstation during the configuration process, so that a swap file will be created when the system is booted for the first time.

The DeployWorkstation Play

Now, we’ll create a play that calls these tasks in the right order to provision our workstation and configure it.  This file will be created in the root of your playbook, and I named it deployworkstation.yml.

Testing our Playbook

To test our work so far, we simply need to execute it with Ansible and see if we are successful:
ansible-playbook -vv -e group_name=test deployworkstation.yml
You should see some output at the end of the playbook run like this:
Screen Shot 2016-01-22 at 10.16.05 PM

Next, connect to your instance with ssh by typing the following command:
ssh ubuntu@ec2-52-90-220-60.compute-1.amazonaws.com
(hint: copy/paste the hostname above)

You should see something like the following after you connect with ssh:
Screen Shot 2016-01-22 at 10.17.01 PM

That’s it!  You’ve now created a workstation in the Amazon public cloud.  Be sure to terminate the instance you’ve created so that you don’t incur any unexpected fees.  You can do this by navigating to the EC2 (top left) dashboard from the AWS console, then selecting any running instances and choosing to terminate them:
Screen Shot 2016-01-22 at 10.18.32 PM

After selecting to Terminate them from the Instance State menu, you’ll need to confirm it:
Screen Shot 2016-01-22 at 10.18.52 PM

Now that you’ve terminated any running instances, in the next part, we’ll learn how to create snapshots, launch configurations, and auto-scaling groups from our immutable golden master images.

Immutable Infrastructure with AWS and Ansible – Part 1 – Setup

Introduction

Immutable infrastructure is a very powerful concept that brings stability, efficiency, and fidelity to your applications through automation and the use of successful patterns from programming.  The general idea is that you never make changes to running infrastructure.  Instead, you ensure that all infrastructure is created through automation, and to make a change, you simply create a new version of the infrastructure, and destroy the old one.  Chad Fowler was one of the first to mention this concept on his blog, and I believe it resonates with anyone that has spent a significant amount of time doing system administration:

“Why? Because an old system inevitably grows warts…”

They start as one-time hacks during outages. A quick edit to a config file saves the day. “We’ll put it back into Chef later,” we say, as we finally head off to sleep after a marathon fire fighting session.

Cron jobs spring up in unexpected places, running obscure but critical functions that only one person knows about. Application code is deployed outside of the normal straight-from-source-control process.

The system becomes finicky. It only accepts deploys in a certain manual way. The init scripts no longer work unless you do something special and unexpected.

And, of course the operating system has been patched again and again (in the best case) in line with the standard operating procedures, and the inevitable entropy sets in. Or, worse, it has never been patched and now you’re too afraid of what would happen if you try.

The system becomes a house of cards. You fear any change and you fear replacing it since you don’t know everything about how it works.  — Chad Fowler – Trash Your Servers and Burn Your Code: Immutable Infrastructure and Disposable Components

Requirements

To begin performing immutable infrastructure provisioning, you’ll need a few things first.  You need some type of “cloud” infrastructure.  This doesn’t necessarily mean you need a virtual server somewhere in the cloud; what you really need is the ability to provision cloud infrastructure with an API.  A permanent virtual server running in the cloud is the very opposite of immutable, as it will inevitably grow the warts Chad mentions above.

Amazon Web Services

For this series, we’ll use Amazon Web Services as our cloud provider.  Their APIs and services are frankly light years ahead of the competition.  I’m sure you could provision immutable infrastructure on other public cloud providers, but it wouldn’t be as easy, and you might not have access to the wealth of features and services available that can make your infrastructure provisioning non-disruptive with zero downtime.  If you’ve never used AWS before, the good news is that you can get access to a “free” tier that gives you limited amounts of compute resources per month for 12 months.  750 hours a month of t2.micro instance usage should be plenty if you are just learning AWS in your free time, but please be aware that if you aren’t careful, you can incur additional charges that aren’t covered in your “free” tier.

Ansible

The second thing we’ll need is an automation framework that allows us to treat infrastructure as code.  Ansible has taken the world by storm due to its simplicity and rich ecosystem of modules that are available to talk directly to infrastructure.  There is a huge library of Ansible modules for provisioning cloud infrastructure.  The AWS specific modules cover almost every AWS service imaginable, and far exceed those available from other infrastructure as code tools like Chef and Puppet.

OS X or Linux

The third thing we’ll need is an OS X or Linux workstation to do the provisioning from.  As we get into the more advanced sections, I’ll demonstrate how to provision a dedicated orchestrator that can perform provisioning operations on your behalf, but in the short-term, you’ll need a UNIX-like operating system to run things from.  If you’re running Windows, you can download VirtualBox from Oracle, and Ubuntu Linux from Canonical, then install Ubuntu Linux in a VM.  The following steps will get your workstation setup properly to begin provisioning infrastructure in AWS:

Mac OS X Setup

  1. Install Homebrew by executing the following command:
    ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
    You should see output like the following:Screen Shot 2016-01-08 at 11.14.05 AM
  2. Install Ansible with Homebrew by executing the following command:
    brew install ansible
    You should see output like the following: (note, I’m actually running Ansible 2.0.0.2 now, but this output was for an older version; use Ansible 2.0+ as it’s the future 🙂 )
    Screen Shot 2016-01-08 at 11.14.32 AM
  3. Install the AWS Command Line Interface (CLI) with Homebrew by executing the following command:
    brew install awscli
    You should see output like the following:
    Screen Shot 2016-01-08 at 2.15.32 PM
  4. Install wget through homebrew by executing the following command:
    brew install wget
    You should see output like the following:
    Screen Shot 2016-01-22 at 11.56.37 AM

Linux Setup

  1. Install Ansible by executing the following command:
    sudo pip install ansible
  2. Install the AWS Command Line Interface (CLI) by executing the following command:
    sudo pip install awscli
  3. Install wget using your package manager.

Generic Workstation Setup

These steps need to be followed whether you’re running a Mac or Linux for your workstation.

  1. Install a good text editor.  My favorite is Sublime Text 2, but you can use whatever you want.
  2. Install the yaegashi.blockinfile Ansible role from Ansible galaxy.  This is a very useful role that will allow us to add blocks of text to configuration files, rather than simply changing single lines.  Type the following command to install it:
    sudo ansible-galaxy install yaegashi.blockinfile
    You should see output like the following:
    Screen Shot 2016-01-08 at 11.24.55 AM

Amazon Setup

There are a few things you’ll need to begin provisioning infrastructure in your AWS account.  First, you’ll need to make sure the default security group in your VPC allows traffic from your workstation.  This is necessary because Ansible will configure your EC2 compute instances over SSH, and needs network connectivity to them from your workstation.

  1. Login to your AWS Console and select VPC from the bottom left of the dashboard.
  2. Click on Security Groups on the bottom left hand side under Security.
  3. Select/highlight the security group named “default”, and select the Inbound Rules tab.  Click the Edit button, then click Add another rule, and for the rule type, select “ALL Traffic”, and insert your workstation’s Internet IP address, with a /32 at the end to indicate the CIDR netmask.  If you don’t know your workstation’s true Internet IP address, you can find it at this website.
    Screen Shot 2016-01-08 at 3.34.08 PM
    Note: I blanked my IP address in the image above.
  4. Click Save to Save the Inbound Rules.
  5. Go back to the AWS Console dashboard, and click “Identity & Access Management.”  It is located towards the middle of the second column, under Security & Identity.
  6. Click on Users on the left, then click “Create New Users.”  Enter a username for yourself, and leave the checkbox selected to Generate an access key for each user.  Click the Create button:
    Screen Shot 2016-01-09 at 5.49.04 PM
  7. Your AWS credentials will be shown on the next screen.  It’s important to save these credentials, as they will not be shown again:
    Screen Shot 2016-01-09 at 5.49.31 PM
  8. Using your text editor, edit a file named ~/.boto, which should include the credentials you were just given, in the following format:
  9. At the command line, execute the following command, and input the same AWS credentials, along with the AWS region you are using:
    aws configure
    For most of you, this will be either “us-east-1” or “us-west-1”.  If you’re not in the US, use this page to determine what your EC2 region is.
  10. Click on Groups, then click “Create New Group”:
    Screen Shot 2016-01-09 at 5.59.18 PM
  11. Name the group PowerUsers, then click Next:
    Screen Shot 2016-01-09 at 5.59.35 PM
  12. In the Attach Policy step, search for “PowerUser” in the filter field, and check the box next to “PowerUserAccess”, then click “Attach Policy”:
    Screen Shot 2016-01-09 at 6.00.09 PM
  13. Click Next to Review, and save your group.
  14. Select/Highlight the PowerUsers group you’ve just created, and click Actions, then “Add Users to Group”:
    Screen Shot 2016-01-09 at 6.00.41 PM
  15. Select the user account you just created, and add that user to the group:
    Screen Shot 2016-01-09 at 6.00.59 PM
  16. Now, we’ll need to create an IAM policy that gives zero access to any of our resources.  The reason for this is that we’ll be provisioning EC2 instances with an IAM policy attached, and if those instances get compromised, we don’t want them to have permission to make any changes to our AWS account.  Click Policies on the left hand side (still under Identity & Access Management), then click Get Started:
    Screen Shot 2016-01-09 at 6.06.26 PM
  17. Click Create Policy:
    Screen Shot 2016-01-09 at 6.06.39 PM
  18. Select “Create Your Own Policy” from the list:
    Screen Shot 2016-01-09 at 6.07.11 PM
  19. Give the policy a name, “noaccess”, and a description, then paste the following code into the policy document:
  20. Click Validate Policy at the bottom.  It should show “This policy is valid,” as you see below:
    Screen Shot 2016-01-10 at 7.27.10 AM
  21. Click Create Policy, then click Roles on the left-hand side of the screen.
    Screen Shot 2016-01-12 at 9.01.41 AM
  22. Click Create New Role, then type in a role name, “noaccess”:
    Screen Shot 2016-01-12 at 9.01.54 AM
  23. Under the Select Role Type screen, select “Amazon EC2”:
    Screen Shot 2016-01-12 at 9.02.06 AM
  24. On the Attach Policy screen, filter for the “noaccess” policy we just created, and check the box next to it to select it:
    Screen Shot 2016-01-12 at 9.02.22 AM
  25. On the Review screen, click the Create Role button at the bottom right:
    Screen Shot 2016-01-12 at 9.02.33 AM
  26. Now, go back to the main screen of the AWS console, and click EC2 in the top left.
  27. Click “Key Pairs” under the Security section on the left:
    Screen Shot 2016-01-12 at 1.38.35 PM
  28. Click “Create Key Pair”, then give the Key Pair a name:
    Screen Shot 2016-01-12 at 1.38.56 PM
  29. The private key will now be downloaded by your browser.  Save this key in a safe place, like your ~/.ssh folder, and make sure it can’t be read by other users by changing the mode on it:
    mv immutable.pem ~/.ssh
    chmod 600 ~/.ssh/immutable.pem
  30. Run ssh-agent, and add the private key to it, by executing the following commands:

    You should see output like the following:
    Screen Shot 2016-01-12 at 1.49.16 PM
  31. Next, install pip using the following command:
    sudo easy_install pip
    You should see output like the following:
    Screen Shot 2016-01-22 at 11.52.01 AM
  32. Then, install boto using the following command:
    sudo pip install boto
    You should see output like the following:
    Screen Shot 2016-01-22 at 11.54.04 AM

The setup of your environment is now complete.  To test and ensure you can communicate with the AWS EC2 API, execute the following command:
aws ec2 describe-instances

You should see output like the following:
Screen Shot 2016-01-12 at 10.00.17 AM

In the next article, we’ll begin setting up our Ansible playbook and provisioning a test system.