Ansible 101 - Standards

Collection of Ansible standards based on my experience.

Ansible

As with most languages there should be some recommended standards to live by. This blog provides a list of the common standards or best practices to live by when writing Ansible code. This is an unofficial list of standards based on my personal experience with hundreds of customers and my own development practices I have used over the years. I have a masters degree in Computation from Oxford University working with programming theory and in the last 20 years in the technology industry I have seen my fair share of code that lacks any standards at all.

In other words, these standards are tried and trusted - they should work well as a starting point. Feel free to adapt them to your own situation. Standards are commonly adapted to specific situations.

General

Complexity Kills Productivity

Ansible tools have always been built around this concept. The tools have simplified complex operations by hiding the complexity (underlying Python code) from the end user! Strive for a similar approach with your Ansible Playbooks and Roles by simplifying what you automate.

Follow the UNIX principle of do one thing, and one thing well.

Optimize for Readability

If done properly, it can be the documentation of your workflow automation.

Think Declaritively

Ansible is a desired state engine by design. If you’re trying to “write code” in your plays and roles, you’re setting yourself up for failure. Playbooks were never meant to be for programming.

Version Control Everything

Always use a version control system (VCS) with your Ansible content

Start Simple and Build as Needed

  • Start with a basic playbook and static inventory
  • Refactor and modularize (using Roles, etc) later
  • Just because you can create a complex structure doesn’t mean you should
  • Focus on testing a simple concept first

Forget YAML Document Markers

You might have heard that YAML documents should begin with --- and end with .... No, they don’t! Just forget these document markers and move on. Your YAML files will work perfectly without them. More than likely they will only cause your repositories to have some YAML files with the start marker ---, some with both start and end, and some with neither because multiple developers have worked on that repository.

These document markers are part of the YAML 1.2 Schema definition and are only needed when you either want to define directives (not needed with Ansible ever!) or you want to define multiple YAML documents within the same file (also never needed by Ansible ever!).

For more information reference YAML 1.2 Schema Specifications.

Clean up your debugging tasks

Make them optional with the verbosity parameter so they’re only displayed when they are wanted.

- name: Show information about the current state
  debug:
    msg: "This always displays"

- name: Debug statement that conditionally shows
  debug:
    msg: "This only displays with ansible-playbook -vv+"
    verbosity: 2

Define testing strategies to increase reliability and reduce costs

Testing increases reliability and reduces costs as it can catch problems early. Consider implementing various testing strategies but do not attempt to build too much too soon - simply introduce tests as needed. The following are some of the common testing strategies:

  • smoke testing
  • sanity testing
  • unit testing
  • integration testing
  • syntax testing
  • runtime testing

Do something today that gets you closer to your long term goal

There are two common mistakes people make when automating:

  • Attempt to do everything at once. Always focus on creating smaller and manageable pieces of work that produce perhaps a portion of the total functionality or feature you want to implement. This allows something to still be delivered to the end user but without the long time commitment. The smaller the delivery, the easier it is to manage it, reduce errors, and produce constant stability and trust in the pipeline and end product.
  • Expect that it needs to be perfect. Everyone gets caught saying “We don’t have time to do that right now” but really they mean “We don’t have time to do [all of] that right now”. Often you have time to do something small right now that gets you a little closer to “all of that”.

Use smoke tests when starting services

Don’t just start the services - test them to ensure they are ready and available. Smoke tests can be performed using various methods such as with the uri module.

- name: check for proper response
  uri:
    url: http://localhost/myapp
    return_content: yes
  register: result
  until: '"Hello World" in result.content'
  retries: 10
  delay: 1

Refrain from creating arbitrary delays between tasks

The pause and wait_for modules are commonly used to create an arbitrary delay between tasks. It is always better to wait for a specific condition to happen (file exists, file or folder does not exist, registry entry created, etc) than to wait for an arbitrary amount. The problem with selecting an arbitrary time for an operation to complete is that the total time required is typically not a fixed amount of time as processes are dependent on many things (CPU, memory, dependent processes, etc) and something that completed in 30 seconds today may require 60 seconds or more when executed on Production environment or perhaps simply next year on the same server. As a result of this factor, often developers select an extrordanarily large amount of time to compenate for the possible variance. This becomes a different problem now that we are delaying the execution and therefore completion of tasks by a much large amount of time than necessary. Additionally, this is only against a single target server - if we scale this up and run our Playbook against 1000 servers the impact is much greater. Of course using more Ansible forks can reduce that problem but each fork demands more memory. It simply is not a great solution and considered a “lazy” workaround to a problem.

Examples of arbitrary delays:

- name: Pause for 5 minutes to build app cache
  pause:
    minutes: 5

- name: Sleep for 300 seconds and continue with play
  wait_for:
    timeout: 300
  delegate_to: localhost

Use command modules sparingly

  • Always seek out a module first
  • Use the run command modules like shell and command as a last resort
  • The command module is generally safer
  • The shell module should only be used for I/O redirect

For example, here we could be lazy and use the command module:

- name: add user
  command: useradd appuser

- name: install apache
  command: yum install httpd

- name: start apache
  shell: |
    service httpd start && chkconfig httpd on

However, a better approach is using the available modules:

- name: add user
  user:
    name: appuser
    state: present

- name: install apache
  yum:
    name: httpd
    state: latest

- name: start apache
  service:
    name: httpd
    state: started
    enabled: yes

Create internal landing site for Ansible Automation

  • Create a style guide for developers
  • Use mkdocs for a super simple way to spin up a landing site
  • Include some of the following concepts:
    • Ansible standards
    • Onboarding instructions for other teams
    • Release process
    • Vision, goals, responsibilities
  • Produces consistency in things like
    • Naming of Tasks, Plays, Variables and Roles
    • Directory Layouts
    • Whitespace
    • Tagging

Enforce the style using tools

Style guides are great but they don’t ensure that your developers actually follow these rules. In fact, it’s guaranteed they won’t! So use linting tools such as ansible-lint, a command-line static analysis tool that checks for behaviour that could be improved.

Naming Standards

Consistent naming standards

Have consistent naming standards, independent of the object. This establishes a clear single rule that can easily be followed by multiple developers and teams. The following standards apply to files, folders, inventory group names, variable names, role names, repository names, and objects defined within Ansible Tower.

  • *.yml as the file extension for YAML files
  • *.j2 as the file extension for Jinja files
  • Use lowercase a-z
  • Use underscore _ as a separator, not dashes
  • Do not use whitespace as a separator
  • Terse, one word if possible, using underscores if necessary
  • Human-meaningful

An exception with regards to the repository name might occur when the name of these repositories are managed by a different set of standards that cannot be changed. When creating an Ansible Role repository in this case you can use the role_name parameter in the meta/main.yml to override the repository name from being used as the role name. This will ensure the role name still conforms to the Ansible naming standards.

Prefix all Role variables with the name of the role

For example:

apache_port: 80
apache_path: /opt/apache

Use appropriate naming standards for files

Files are typically sorted alphabetically within IDE tools and simply from the terminal shell and having a good naming standard can help developers read and organize them.

For example, name your files using the noun as the prefix and action as the suffix:
<noun>_<action>.<extension>

This ensures the files are listed and organized by product/application first and then by the action that is supported. Keeping your files organized in this manner makes it much easier to scan a Project/Role/etc to determine the product and all supported actions. For example:

apache_download.yml
apache_install.yml
apache_manage.yml
nginx_install.yml
nginx_manage.yml

Name repositories based on their content

The name of the version control repository should be short but answers the following two high level questions:

  • Is this containing an Ansible Project or Ansible Role or Ansible Collection?
  • What is the general purpose of the content?

For example an Ansible Role repository for managing Apache software installations could be named: ansible_role_apache. Notice we still only use lowercase/underscore in the name of the repository but we also try to answer the 2 questions. This helps developers search for and use the appropriate repository for their work. Reference hundreds of examples on Ansible Galaxy website.

Similarly, an Ansible Project repository that handles infrastructure provisioning on Azure platform could be named ansible_project_azure_provisioning.

Do NOT name your repositories with a mix of underscore and dashes such as ansible-project_azure_provisioning, as this is just simply confusing and frustrating when developers start to clone and reference repositories.

Always name Tasks

Tasks should always be named using the name: parameter, no exceptions. In its output, Ansible shows you the name of each task it runs. The goal of writing Ansible is that it’s readable, not only for the developers but for those reading the output of a specific run.

When naming a Task it should be an English sentence describing the purpose of the Task. It should be terse, descriptive, with spaces allowed. Do not use comments as a replacement for the actual name of the task!

An acceptable example using comments and naming:

# This is an temporary workaround and should be replaced soon.
- name: Ensure file exists
  file:
     path: /opt/test
     state: touch

Always mention the state

For many modules, the state parameter is optional. Different modules have different default settings for state, and some modules support several state settings. Explicitly setting state: present or state: absent brings consistency and clarity to your Playbooks and Roles.

Projects

Standard Project Structure

ansible_project_myapp
├── collections
│   └── requirements.yml
├── roles
│   └── requirements.yml
│   ├── myapp
│   │   ├── tasks
│   │   │   └── main.yml
│   │   └── ...
│   ├── nginx
│   │   └── ...
│   └── proxy
│       └── ...
├── inventory
│   ├── group_vars
│   │   └── web.yml
│   ├── host_vars
│   │   └── db1.yml
│   └── hosts
├── myapp_configure.yml
├── myapp_provision.yml
└── site.yml

Separate provisioning from deployment and configuration

Build separate Playbooks to handle each of these major operations and string them together using the site.yml Playbook.

$ cat site.yml
---
- import_playbook: myapp_provision.yml
- import_playbook: myapp_configure.yml

Inventory

Use dynamic inventory with clouds

With cloud providers and other systems that maintain canonical lists of your infrastructure, use dynamic inventory to retrieve those lists instead of manually updating static inventory files. With cloud resources, you can use tags to differentiate production and staging environments.

Group hosts for easier inventory selection and less conditional tasks – the more groups the better.

Groups are powerful in the Ansible inventory and a Playbook can easily target a group of hosts using the group name which can be freely defined. Consider using naming standards when creating group names so there is some level of consistency across your entire inventory. Try to use human-meaningful group names. Use the group_vars folder to define the variables specific to each of the groups. This allows for host names to change, but group names can stay consistent which means your Playbooks do not need to be changed. Reducing code changes means more stability.

Build single source of truth

Use a single source of truth if you have it – even if you have multiple sources, Ansible can unify them. The power of the Inventory structure is that its simply a folder structure that manages multiple platforms, environments, host groups, etc. Additionally with Dynamic Inventory and using Inventory Plugins for various platforms (VMware, AWS, Azure, Google, etc) we can build a single source of truth across your entire hybrid cloud landscape.

Create separate repository for your Inventory

Similar to the advantages of creating an Ansible Role in a separate repository (instead of inside the Project repository in the roles/ folder) consider moving your Inventory to a separate repository!

It’s not required to keep the Inventory in the same repository as the Project that contains your Playbooks.

Advantages:

  • Manage your inventory separate from your Playbooks, Roles, etc.
  • Inventory structure can have its own life cycle in a version control system
    • Changes to a Playbook or Role does not affect your Inventory since it sits in a different repository
  • Single source of truth for your inventory in one repository
  • Control access to the repository content separately from other Ansible content
  • Easily load the inventory within Ansible Tower however you want

Only use vault encrypted strings

Never encrypt an entire file within the inventory. It makes it difficult to understand or search the contents of the file using typical text editors or develpment tools. To use encrypted strings in your inventory file it will need to be in YAML format instead of the traditional INI format.

Consider using symbolic links to manage your shared variables

Read through this article from Digital Ocean that talks about why symbolic links are useful as an organizational tool for handling shared variables across platforms, environments, and so on. Do not be afraid to use symlinks within your repository.

Advanced Tip: Also consider using anchors/aliases as yet another tool for managing your inventory variables.

Remember, files can be folders too.

Ansible reads the group-vars/ folder contents and allows files to also be folders. This means the web.yml group file can also be a folder with the same name web. In this case Ansible loads all of the files in that group folder. This is useful when doing more advanced inventory management techniques.

Avoid defining host variables

Always define variables for a group. There may be exceptions that highlight the fact that the host is more of a unicorn host and not aligned to any standard operating environment.

Development

Reconsider Developing Plugins and Modules

  • Remember, complexity kills productivity!
  • Just because you can, doesn’t mean you should
  • Ansible modules such as uri help reduce the need for developing custom modules
  • It will always be easier to maintain only a YAML solution
  • Adding a Python module adds complexity, increases maintanence costs, increases technical debt
  • Consider building a focused Ansible Role instead of a custom module

Good modules are user-centric

If you must create a custom module, please use these guidelines.

  • Modules are how Ansible balances simple and powerful
  • They implement common automation tasks for a user
  • They make easy tasks easier and complicated tasks possible
  • They abstract users from having to know the details to get things done
  • They are not one-to-one mapping of an API or command line tool interface. This is why you should not auto-generate your modules
  • They are not monolithic “does everything” modules that are hard to understand and complicated to use correctly

Playbooks

Plays should do nothing more than include a list of roles

Plays attempt to answer the Where? and the What?. We answer the Where? question by defining the managed nodes that we are targeting using the hosts:. We answer the What? question by providing the workflow of roles which state what we are trying to do. It should be simple and readable, so just a workflow of Roles in a specific order.

Try to prevent writing Tasks in your Plays, if it’s not the include_role task. This logic or functionality should be moved to an appropriate Role.

Use Playbooks instead of Workflow Templates when possible

Since a Playbook can contain one or more Plays, it’s literally defining a workflow. A Playbook represents a workflow in code form and allows full flexibility (conditionals, tasks, tags, pre-tasks, etc). However an Ansible Tower Workflow Job Template is a graphical form of the workflow with only success/fail paths from any single node. If you are focused on “Everything as Code”, you should try to always write Playbooks. Of course one can write code that creates the Ansible Tower Job Templates for each node and then the Workflow Job Template with all required information, but why do all of this when it can be a nice beautiful Playbook that is readable and easily usable.

There is a good exception to this rule when you need the graphical representation for demonstration purposes or other visual needs. Some have argued to use the Workflow Job Template because they already have the Job Templates created and tested and they only want to create an orchestration of these Job Templates using a Workflow Job Template.

Use include_role instead of the traditional roles: section

Traditionally a Playbook contains a roles: section to list the dependent roles and explicitly load them in that same order. Using the tasks: section and defining include_role tasks allows for more control with the order, ability to add when clauses, etc.

Consider using module defaults groups when heavily using cloud modules

Module defaults groups is a feature that was added in Ansible 2.7 that groups together modules that share common sets of parameters. This makes it easier to author playbooks making heavy use of API-based modules such as cloud modules.

For example, in a playbook you can set module defaults for whole groups of modules, such as setting a common AWS region.

- hosts: localhost
  module_defaults:
    group/aws:
      region: us-west-2

  tasks:
  
  # Required parameter 'region' is defaulted
  - name: Get info
    aws_s3_bucket_info:

  # Required parameter 'region' is defaulted
  - name: Get info
    ec2_ami_info:
      filters:
        name: 'RHEL*7.5*'

Roles

Roles should answer the question How?

Recall that a Playbook answers the questions of Where? and What?, but it’s the Role that answers the question of How?. How are we implementing this specific functionality? The answer is in the code written in the Task files, Variable files, Template files, and so on. Normally a new user of the Role should not ever need to care about how the Role was implemented, but rather they want to know how to use it. Make the Role easy to use and consume, by answering how to use it, and ensuring the complex bits are somewhere else in the Role (available to those that are curious but not in the way).

Keep roles self contained and focused

  • Think about the full life-cycle of a service, microservice or container — not a whole stack or environment
  • Roles should be loosely-coupled — limit hard dependencies on other roles or external variables
  • Roles should avoid including tasks from other roles when possible
  • Roles should NOT perform include_role to include another external role
  • Variables needed by the Role should all be defined within the Role (using defaults/main.yml or vars/main.yml)
  • Pass variables to the role when calling include_role:

Roles should be environment independent

Do not place environment specific variables or values inside an Ansible Role, as the purpose of a Role is to be usable against any environment or even platform or possibly distribution. Flexibility is the key to a successful Role that can be easily used. Remember, environment specific values are loaded from your Inventory structure and these values are passed into the Role to set or override the default values.

Roles should be more than a single task

It’s common to find an implementation where there are over 20 Roles defined in separate repositories and each Role has only the tasks/main.yml defined and often containing only 1 task. This is an overly complex solution that could be simplified.

Before creating a Role, think about the overall design and purpose. Try to group similar functionality (install, configure, etc) into a single Role when it makes sense to do so. For example, decide whether you should create one Role to handle the install and configuration or two Roles (apache_install and apache_configure). When Roles have to manage similar variables they should be considered as one single Role using multiple Task files (tasks/apache_install.yml and tasks/apache_configure).

Automate the testing of your roles

Use Molecule, a testing framework designed to aid in the development and testing of Ansible Roles. Note that ansible-lint can be run as part of your Molecule test runs.

Roles should never contain vaulted information

As Roles should be environment-independent, vaulted information tends to be environment-specific. Often Roles are built and then tested against target machines which can result in the vaulted information being used as default values for variables. Instead, set variable defaults to cleartext such as ‘password’ and allow the true vaulted secret to override these defaults when passed from the command line or using Ansible Tower credentials.

Roles should define external (default) variables using defaults/main.yml

For those variables that can or should be overridden by the user of the Role, they should be defined with default values in the defaults/main.yml file. This is generally where new users should look in order to quickly understand what can be overrriden or customized with a Role.

When variables are defined in here, they have the lowest precedence and are easily overridden by other variables defined elsewhere - that is the intention as they are only defining the default value.

Do NOT define all variables in this file.

Roles should define internal variables using vars/main.yml

Any variables that are needed by the Role to perform the tasks and are not needed by anyone outside the Role should be defined using vars/main.yml. Remember, we are trying to hide the details of how to do the tasks. The typical user of the Role does not need to know or understand these details and variables so we place them in a different location.

Roles should use variables instead of hardcoded strings

Even if the string is referenced once, it is still better to define a variable and have that variable added to the vars/main.yml or defaults/main.yml as it’s much easier to scan and maintain these variables in a central location. More often one forgets about the hardcoded strings and it causes a runtime error or misconfiguration.

Roles do not always need the tasks/main.yml tasks file

A default Role file structure contains the tasks/main.yml as the common entry point for a Role. This is used when referencing a Role using roles: section in a Playbook or by simply including a Role using include_role: or import_role: tasks. However in some cases it might not make sense to have a “default” behavior for a Role. For example, the Role might support “install” and “configure”, in which case you can create the Role this way:

defaults/
   main.yml
vars/
   main.yml
tasks/
   configure.yml
   install.yml

And then call the Role using a specific task:

include_role:
   name: apache
   tasks_from: configure.yml

This is perfectly acceptable and in fact it forces the user of the Role to select the function (using the task file) instead of guessing or checking what behavior will happen by default. In many cases, I prefer this style for Roles. It’s good to know you have the option to design it this way as well.

Consider using module defaults when using a specific module heavily

Using module defaults can greatly simplify your code by eliminating duplication of module parameters. A great example is when building a Role that heavily uses the uri module to make REST API calls to an external service. Instead of defining the same parameters for each Task, set the defaults and then override or set whatever you need in each specific Task.

Variables

Separate logic (tasks) from variables to reduce repetitive patterns and provide flexibility.

The code here shows embedded parameter values and a repetitive home directory value pattern in multiple places. This works but could be more flexible and maintainable.

- name: Clone student lesson app for a user
  host: nodes
  tasks:
    - name: Create ssh dir
      file:
        state: directory
        path: /home/{{ username }}/.ssh

    - name: Set Deployment Key
      copy:
        src: files/deploy_key
        dest: /home/{{ username }}/.ssh/id_rsa

    - name: Clone repo
      git:
        accept_hostkey: yes
        clone: yes
        dest: /home/{{ username }}/exampleapp
        key_file: /home/{{ username }}/.ssh/id_rsa
        repo: git@github.com:example/apprepo.git

Here we see the improved version, where parameter values are set separately from the tasks and can be easily overridden. Human meaningful variables “document” what’s getting plugged into the task parameter. This can now be more easily refactored into a Role.

- name: Clone student lesson app for a user
  host: nodes
  vars:
    user_home_dir: "/home/{{ username }}"
    user_ssh_dir: "{{ user_home_dir }}/.ssh"
    deploy_key: "{{ user_ssh_dir }}/id_rsa"
    app_dir: "{{ user_home_dir }}/exampleapp"
  tasks:
    - name: Create ssh dir
      file:
        state: directory
        path: "{{ user_ssh_dir }}"

    - name: Set Deployment Key
      copy:
        src: files/deploy_key
        dest: "{{ deploy_key }}"

    - name: Clone repo
      git:
        dest: "{{ app_dir }}"
        key_file: "{{ deploy_key }}"
        repo: git@github.com:example/exampleapp.git
        accept_hostkey: yes
        clone: yes

Define temporary variables using using unique prefix

Registered variables and facts set using set_fact module are often defined for temporary usage. In order to avoid variable collision and make it clear to the reader that these variables are only for temporary usage, it is good practice to use a unique prefix. For example, you could use r_ for registered variables and f_ for facts. Another possibility is simply using _ underscore as the prefix for any temporary variables: _results or _myfact.

- name: Collect information from external system
  uri:
    url: http://www.example.com
    return_content: yes
  register: r_results
  failed_when: "'AWESOME' not in r_results.content"

- name: Set some facts for temporary usage
  set_fact:
    f_username: "{{ r_results.username }}"

Define paths without trailing slash

Variables that define paths should never have a trailing slash. Also when concatenating paths, follow the same convention. For example:

# yes
app_root: /foo
app_bar: "{{ app_root }}/bar"

# no
app_root: /foo/
app_bar: "{{ app_root }}bar"

Keep dictionaries simple

Name the dictionary using the standard naming conventions described above, and name the keys defined inside the dictionary without repeating the same prefix. For example:

apache_options:
  port: 80
  path: /opt/apache
  version: 1

Conditionals and Return Status

Place the when clause on Tasks after the name clause (always start with name:)

Keep your Ansible readable by maintaining a consistent style. By always using name: as the first clause in a Task it helps the reader understand what you are trying to achieve in this task and makes it easier to scan a set of tasks defined in a task file.

Some examples of good and bad style:

# yes
- name: Print the dictionary
  when: my_dict is defined
  debug:
    msg: "My dictionary values: {{ my_dict }}"

# no
- when: my_dict is defined
  name: Print the dictionary
  debug:
    msg: "My dictionary values: {{ my_dict }}"

Ensure variables are defined before you reference them

You can use the assert module or a when clause to ensure they are defined. Some example tasks:

- name: Ensure variable is defined
  assert:
    that:
      - my_var is defined
      - my_var >= 0
      - my_var <= 100
    fail_msg: "'my_var' must be between 0 and 100"
    success_msg: "'my_var' is between 0 and 100"

- name: Output my custom message
  debug:
    msg: "The value is: {{ my_var }}"
  when: my_var is defined

- name: Ensure variable is defined
  fail:
    msg: "Variable is not defined"
  when: my_var is not defined

Verify return status using failed Jinja conditional filter function

For example:

- name: Check for something
  command: /bin/false
  register: my_result
  ignore_errors: True

- name: Failed to do something
  debug: msg="task failed"
  when: my_result | failed

Formatting

Use spaces around Jinja variable names

For example, use {{ var }} and not {{var}}.

Use native YAML syntax to maximize readability

Vertical reading is easier and it supports complex parameter values. Additionally this format works better with syntax highlighting in editors. It also works much better with a version control system, since a change to a specific line of code often means one parameter change, but when it’s all on one line then it’s more difficult to determine what exactly changed. For example:

# yes
- name: Ensure text file exists
  file:
    dest: "{{ test }}"
    src: "./foo.txt"
    mode: 0770
    state: present
    user: "root"
    group: "wheel"

# no
- name: Ensure text file exists
  file: >
    dest={{ test }} src=./foo.txt mode=0770
    state=present user=root group=wheel

Break long lines using YAML line continuation

One often defines key/value pairs in Ansible and the value can be quite a long string. This is common when using the shell or command modules to execute something that cannot be done using an Ansible module. In these cases, use the line continuation mechanisms that are available to improve readability. Again, it also helps a lot with version control systems since a change to a specific line of code that is broken into multiple lines is easier to detect the change.

There are many types of multiline string mechanisms - learn about them here.

- shell: >
    python a very long command --with=very --long-options=foo
    --and-even=more_options --like-these

Collections

As of Ansible 2.10, collections are required. So here’s some style guides.

TODO