Here at Imaginary Landscape, we are frequently tasked with spinning up new cloud servers for our clients. Once a new server is online, we usually follow up by provisioning it with useful software. To automate that process, we began using Ansible a few years ago, and have had a lot of success with it. Provisioning is complicated by nature, however, and no provisioning attempt has been completely free of difficulty.
This post examines a recent provisioning and uses it to expose issues that caused pain. Some of this is undoubtedly a case of the day-to-day getting in the way of tying up loose ends, but other aspects represent more generic problems. It is in the latter category where we hope this post might hold some wider value for the reader.
Ansible offers a lot of flexibility in terms of how its configuration is laid out. A very pared-down version of ours looks like this:
ansible-django-stack/ ├── extra-variables.yml ├── ansible/ └── ├── inventory.ini └── roles/ │ └── django/ │ └── tasks │ └── base.yml └── playbooks/ ├── playbook-all.yml └── playbook-all.retry
Now let's look at what some of those files do.
STEP 1: The Inventory File
Our Ansible uses an inventory.ini file to obtain the credentials and options it needs to SSH into the target server. Ansible configuration files are usually .yml files, but inventory.ini is different. A very basic example might be:
[all] <target-ip-address> ansible_ssh_user=<linux-user-on-target> ansible_ssh_pass=<password-for-that-user>
For me these Ansible scripts have worked best when I set ansible_ssh_user to "root." A brand-new server only has a "root" user account anyway, and it will have all the permissions needed.
An inventory file can be much more complicated than the example above, but other options are documented elsewhere and are beyond the scope of this post, so let's move on.
STEP 2: The "Extra Variables" File
This file contains values for variables that will probably change from one provisioning to the next. Things like the name of the website that will be served, a path to a GitHub repository that contains the codebase for the website, and True / False switches that indicate which programs should be installed or left uninstalled for that site. For example, although we can use our scripts to install either MySQL or PostgreSQL, chances are that a given client will not need both.
Most of our clients have websites that use third-party software downloaded from GitHub and sometimes those packages include software that we have developed here in house. Our own GitHub repository is private and requires a key for access. For this reason, the extra-variables file can also be given a local path to this key. However, I have never seen this work successfully. I have tried providing a path to my personal private key and also to the Imaginary Landscape private key, but the Ansible script has always failed at this step with the message, "Please make sure you have the correct access rights and the repository exists." We currently work around this by logging into the server directly and installing the requirements from there.
STEP 3: Roles and Playbooks
Ansible uses roles to represent different operations. Like most of Ansible's configuration, a role is encapsulated in a .yml file. For example, we have a role called "supervisor" and the role contains instructions to install Supervisor on a Debian-style OS.
Roles are assembled into larger-scale configurations called playbooks (also .yml files). For example, we use a playbook called playbook-all.yml to fire off our entire provisioning run, even though the details of what Ansible is doing will be laid out in lower-level configuration files. Our playbook-all.yml looks like this:
- name: Base states hosts: all vars: - update_apt_cache: yes roles: - role: base - role: unattended - role: timezone - role: ntp - role: supervisor - role: nginx - role: letsencrypt - role: postfix - role: postgres #- role: postgis #- role: mysql #- role: redis #- role: elasticsearch - role: django #- role: docker - role: nrpe
Some roles have been commented out, as we did not need them for this particular deployment.
STEP 4: Running the Script (Finally)
Ok, with the inventory, extra-vars, and playbook files ready, change directory to the top of the Ansible Django stack repository (if you're not already there), and run Ansible:
$ ansible-playbook -vvvv -i ansible/inventory.ini --extra-vars "@extra-vars.yml" ansible/playbooks/playbook-all.yml
Hopefully your terminal will display several minutes' worth of streamed success messages. If Ansible hits an error and is forced to stop before it gets to the end of the playbook, you will see the word FAILED! near the bottom of the output. In this case, look for “task path” which will provide guidance on where it got stuck.
Sometimes Ansible will fail right out of the gate with the error UNREACHABLE! and the message "SSH Error: data could not be sent to remote host '<target-ip>'. Make sure this host can be reached over SSH." You check and re-check your configuration, but everything is correct, and SSHing to the target server from the command line works just fine. What is going on?
The reason you may see this error is that Python is not installed on the target server. Yes, the error about the server being unreachable is totally misleading. Why isn't Python installed? This depends on how the target server is spun up.
To be safe, before attempting to provision with Ansible, log into the newly spun-up server and check whether Python is there. If it isn't, we execute this python-specific playbook first:
ansible-playbook -vvvv -i ansible/inventory.ini ansible/playbooks/playbook-install-python2.yml
We then retry the command with playbook-all.yml as above.
STEP 5: Debugging
If Ansible is unable to complete a task, the word FAILED! will appear in the output. Sometimes Ansible will stop everything at this point. Other times it will decide to ignore the failure and keep going. In these cases it may still report failed=0 at the end, which is misleading to say the least. So be sure to scan your output for the FAILED! message, even if it looks like all went well.
If everything does come crashing to a halt, our scripts helpfully tell you how to pick up where you left off when you're ready to try again. That is, after you've fixed whatever went wrong:
to retry, use: "--limit @/path/to/playbooks/directory/playbook-all.retry"
As for fixing whatever problem caused the FAILED! message, that depends on the nature of the error. For example, with the first error that I hit, this clue was in the output:
"cmd": "apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 ...",
On that occasion I logged into the new server, ran the above command by hand and it seemed to work. I went back to my local machine and restarted the process in the suggested way:
$ ansible-playbook -vvvv -i ansible/inventory.ini --extra-vars "@extra-vars.yml" ansible/playbooks/playbook-all.yml --limit @/path/to/playbooks/directory/playbook-all.retry
Sure enough, Ansible got a little farther this time. However, execution again ground to a halt with a FAILED! message. This time the output didn't provide a cmd, but I could see it had something to do with encrypting passwords when creating a Postgres user. I eventually added encrypted=yes to the definition of postgresql_user in our file ansible/roles/postgres/tasks/base.yml.
Working in this iterative way, eventually the script was able to run to its conclusion. Some of the errors, such as the problem with encrypted passwords above, arose due to changes in requirements between Ubuntu 16.04, which was current at the time the scripts were created, and 18.04 which is current as I write this. We do our best to commit changes to the scripts where appropriate so that we can keep them relatively up to date, but of course when 20.04 rolls around there will be new problems to solve.
NOTE: What Ansible Does Not Do
At this time, the Ansible script does not create firewall rules or Ubuntu user accounts for you. For now, you just have to log in and do those things by hand, just as our grandparents did.
And so ends Part 1 of the post. Stay tuned for Part 2 which shows how I got uWSGI working with NGINX and Django.