Fork me on GitHub

Ivan-Site.com

Auto-scaling on Amazon EC2 with Opscode Chef

There are lots of ways for setting up auto-scaling for EC2 nowadays, there's Amazon's own products like the recently announced AWS OpsWorks and CloudFormation. The benefit of using these tools is integration with other AWS services. But, there's also downsides, as OpsWorks cannot integrate with ELB currently, and using CloudFormation will probably involve you writing funky JSON templates.

There's also third-party solutions, like open-source Asgard from Netflix and rightscale, an enterprise cloud management service.

These services can also be used for some basic configuration management, though I feel that is not their primary purpose. We chose to go with a separate solutions for that - Opscode Chef.

There are lots of guides on how to set up EC2 auto-scaling, as well as guides on integrating Chef with CloudFormation, like Amazon's own docs, however there isn't much information on how to do this without CloudFormation. Specifically, if you just want auto-scaling without the extra complexity of CloudFormation and still want to use Chef for configuration management, here's what you need to do.

Before you continue you'll need to install and configure the auto scaling tools, which you can get from your favorite package manager or directly from Amazon.

Adding nodes to chef

First you need to create a launch config for AWS to use when launching new instances. The difference here compared to something like "knife ec2 server create" is that you need to manually bootstrap Chef. Replace parameters with ones for your application as needed.

as-create-launch-config EXAMPLE --image-id ami-b8d147d1 --instance-type m1.large \
    --group EC2_SECURITY_GROUP --monitoring-disabled --user-data-file chef-user-data.sh

The most important part here is the "--user-data-file" parameter, as we're able to provide a script that will bootstrap Chef for us. Here's a version that works with Ubuntu 12.10 and Chef 11.4

#!/bin/bash -v
# install pre-requisites
apt-get update
apt-get upgrade
apt-get install -y ruby1.9.1-dev ruby1.9.1 rubygems s3cmd
gem install ohai chef --no-rdoc --no-ri --verbose
mkdir -p /etc/chef
# write first-boot.json
(
cat << 'EOP'
{"run_list": ["role[YOUR_SERVER_ROLE]"]}
EOP
) > /etc/chef/first-boot.json
# write .s3cfg
(
cat << 'EOP'
[default]
access_key = ***
secret_key = ******
use_https = True
EOP
) > /home/ubuntu/.s3cfg
# get chef validation key from S3
s3cmd -c /home/ubuntu/.s3cfg get s3://YOUR_BUCKET/validation.pem
/etc/chef/validation.pem
# write client.rb
(
cat << 'EOP'
log_level :info
log_location STDOUT
chef_server_url 'YOUR_CHEF_URL'
validation_client_name 'YOUR_PROJECT-validator'
EOP
) > /etc/chef/client.rb
# Bootstrap chef
chef-client -j /etc/chef/first-boot.json

This script will get your validation key for chef from S3 and register the instance with the chef server. Next, we create the auto-scaling group (once again, change parameters as needed).

as-create-auto-scaling-group EXAMPLE --availability-zones us-east-1a,us-east-1b \
    --launch-configuration EXAMPLE --desired-capacity 2 --min-size 2 --max-size 10 \
    --load-balancers YOUR_ELB_NAME --health-check-type ELB --grace-period 300

You can then setup triggers for scaling the group up and down using "as-create-or-update-trigger" script or create policies with "as-put-scaling-policy" script and add alarms in the CloudWatch web interface.

You should see servers booting up as soon as you create the auto-scaling group.

Deleting nodes from chef

Now you should have your servers provisioned and correctly registered with chef. The only missing part would be to remove them from chef when they're shut down. The simplest way to do this would be to place a script in /etc/rc0.d/ that will do it for you. You can do this with the userdata script, or with a Chef recipe:

/usr/local/bin/knife node delete -y -c /root/.chef/knife.rb <%=node['fqdn'] %>
/usr/local/bin/knife client delete -y -c /root/.chef/knife.rb <%=node['fqdn'] %>

It will also require you to write a knife.rb config on the server (into /root/.chef/knife.rb in this case).

This solution is not ideal, as it won't delete your servers from chef in case the server goes completely unresponsive (which does happen with EC2) but is good enough. You could also setup something more advanced that will send a SQS notification when scaling down, and have the listener remove nodes for you.

Conclusion and Notes

There you have it, we now have a fully automated auto-scale solution that you can still manage with chef.

User data that you provide to launch config will be included with each instance that is launched and can be seen in plain text, so it's probably not a good idea to include your global AWS credentials in there.

Unfortunately, if something goes wrong, the only way to debug is checking the system log of the instance. But once chef-client starts running, you're pretty much done assuming your recipes don't produce any errors and your application boots up and is able to respond to the ELB healthcheck. You should probably test this beforehand with a simple "knife ec2 server create" and making sure that you can just add the instance to ELB afterwards.

I also recommend setting a high grace period for the auto-scale group, as chef and all of its dependencies can take a while to install (> 2 minutes).

Posted Thu 28 February 2013 by Ivan Dyedov in AWS (Amazon Web Services, Ubuntu, Opscode Chef, autoscale)

3 comments

  • Avatar
    Forest Handford 2015-02-06T19:27:53

    Ivan, great article. I'm finding with CentOS a script in /etc/rc0.d/ won't execute when a node is scaled in. Any ideas if there is a way to run code on CentOS before termination?

  • Avatar
    Matt 2013-03-24T13:54:15

    You can avoid having to configure the AWS credentials in the user data file by taking advantage of the IAM role based instances which allows you to use something like s3cmd or boto without having to provide any AWS credentials. See http://docs.aws.amazon.com/... for some more details

    Since s3cmd in the apt repo is v1.0 it will not take advantage of the IAM instance role (that was just added recently within the past month or two). Below is a snippet of some modifications that I have made to my user-data file.


    # Install git onto the machine
    apt-get install -y git

    # Install s3cmd
    git clone https://github.com/s3tools/s3cmd.git /tmp/s3cmd
    cd /tmp/s3cmd
    git checkout -b v1.5.0-alpha3
    python setup.py install

    # Install the chef client
    curl -L https://www.opscode.com/chef/install.sh | sudo bash

    # get chef validation key from S3 (assumes the correct IAM role applied to the instance)
    cd /etc/chef
    sudo s3cmd get s3://{BUCKET-NAME}/validator.pem
    • Avatar
      Ivan Dyedov 2013-04-01T15:37:56

      Hi Matt, thanks for they reply, IAM roles definitely seem like the way to go.

      Seems knife-ec2 plugin still doesn't have ability to specify IAM roles based on http://tickets.opscode.com/... though it's not technically needed for autoscaling in any way.

      Also, you can install the latest s3cmd alpha with

      pip install git+https://github.com/s3tools/s3cmd.git@v1.5.0-alpha3