Ansible is a great tool for configuration management but because of the way it’s designed a common complaint is that it’s not as fast as other tools like Salt, Chef or Puppet. This is because Ansible doesn’t have an agent that listens (although it can) on a host and uses a different type of deployment methodology that is based on SSH. This post isn’t about the pros and cons of each tool, but rather about ways to improve upon Ansible’s default configuration values. By default Ansible ships with very conservative default values. This is smart in my opinion because it offers greater compatibility out-of-the-box. Here I highlight some safe adjustments that can be made to the default configuration for improved performance (speed!)
Real World Playbook Test
For this test I’m using a real-world playbook that I use in my homelab when provisioning a new CentOS VM. It configures some basic things (hostname, ssh keys, etc), installs common packages/utilities and tunes some OS configurations.
The VM I’m running the playbook from is on a Centos 7 VM running on an ESXi 6.5 Host. The playbook will be running against 12 target VMs. The VMs it will be talking to are on the same VMNetwork. The Ansible VM has 4 vCPUs and 8GB of ram.
Before tuning Ansible, we’ll need to gather some metrics about how each playbook run performs. Fortunately in Ansible v2.0 and higher there are two built in callbacks that can be enabled: timer and profile_tasks Timer will output the total playbook run time, similar to running the time command before an ansible-playbook command. The second and more interesting of the two IMO is profile_tasks. This callback displays a nice summary of each TASK and how long it took to execute. To enable these settings edit (or create) an ansible.cfg file. You can check to see if you already have an Ansible config file by running:
This tells you the location of the configuration file that Ansible uses and the version. If you don’t see a config file listed you can create one in the directory where your playbooks will be run.
We’re going to add the following line to the config file under the [defaults] subsection:
The important line here is the last one: Playbook run took … 2 minutes, 4 seconds That’s 124 seconds. Not terrible, but if you’re deploying to a large number of machines (say 50 or 100) those minutes can quickly add up.
Let’s start making some configuration tweaks and see if we can speed things up.
Enable SSH Pipelining
To enable SSH pipelining, add this to your ansible.cfg file under the [defaults] heading:
1
pipelining=True
From the Ansible manual: Enabling pipelining reduces the number of SSH operations required to execute a module on the remote server, by executing many ansible modules without actual file transfer.
Let’s run the same playbook again but with this configuration option set and see what happens:
Friday 08 June 2018 16:07:19 -0400 (0:00:23.585) 0:01:56.055 *** ================================================================= TASK | Start filebeat and enable service ----------------- 23.58s TASK | Install packages ---------------------------------- 16.75s TASK | Install filebeat ----------------------------------- 6.17s Gathering Facts ------------------------------------------- 5.50s TASK | Install rpms for Spacewalk / RHN ------------------- 4.61s checkmk : TASK | Copy Checkmk Agent Listener -------------- 2.33s checkmk : TASK | Copy Checkmk Agent ----------------------- 2.26s TASK | Set /etc/hostname ---------------------------------- 1.91s TASK | Copy Influxdata repo (for Telegraf) ---------------- 1.90s TASK | Copy ssh/config for user --------------------------- 1.88s TASK | Copy ssh keys for user ----------------------------- 1.87s TASK | Copy .bash_logout for user ------------------------- 1.83s TASK | Copy Telegraf config ------------------------------- 1.82s TASK | Update /etc/services file -------------------------- 1.82s TASK | Copy .bashrc for user ----------------------------- 1.82s TASK | Copy Telegraf environment default ------------------ 1.81s TASK | Copy .bashrc for root ------------------------------ 1.80s TASK | Install prowl -------------------------------------- 1.79s TASK | Install prowl API key ------------------------------ 1.77s TASK | Copy .bash_logout for root ------------------------- 1.77s Playbook run took 0 days, 0 hours, 1 minutes, 55 seconds
Here we can see that the play run completed 9 seconds faster. Not bad. Let’s see if we can tweak it some more.
Reduce poll interval to 5s
The default poll interval is set to 15 seconds. This is how often Ansible will check on task that’s running and decide if it can proceed. Let’s set it to 5 seconds and see what happens. Add or edit this line in the ansible.cfg file, again under the [defaults] heading:
It took 106 seconds to run the playbook that time. That’s 18 seconds faster than what we started with. Nice.
Let’s try another tweak and see if we can’t do even better.
Increase forks to 25
For my use case I’m increasing the number of simultaneous forks to 25 from the default value of 5. Again, Ansible ships with pretty sane defaults. We don’t want sane, we want fast. Let’s see how how this does:
Very nice. Now we’re at 85 seconds. Remember, I’m running the exact same playbook just with new configuration values (options). This is very good but I think there’s more we can do.
Enable fact_caching
By enabling this value we’re telling Ansible to keep the facts it gathers in a local file. You can also set this to a redis cache. See the documentation for details. Fact_caching is what happens when Ansible says, “Gathering facts” about your target hosts. If we don’t change our targets hardware (or virtual hardware) very often this can be very helpful. Enable it by adding this to your ansible.cfg file:
75 seconds. Very nice. These tweaks have made a huge difference.
Let’s recap
We’ve reduced our playbook run time from 2 minutes and 4 seconds down to 1 minute and 15 seconds. (184 seconds -> 75 seconds) That’s 40% less time to run the exact same playbook with just a few configuration tweaks.
By adding / editing these configuration values we were able to cut our playbook run time nearly in half. Now, these results aren’t going to be the same for everyone, every playbook or every environment. There are many factors that account for Ansible performance.
It’s clear, however, that modifying the defaults as we did here results in significant performance gains and can save you time on deployments.
( I’ll add a pretty table with summary here someday. )