Fixing Ansible playbooks

0

I’ve been using Ansible for several years to perform different tasks on my many physical and virtual hosts. It’s an excellent tool for defining the state to be set on a number of computers.

I have several playbooks I use frequently, one of which I use to perform post-installation personalization and configuration of physical and virtual Linux hosts. This PostInstall.yml playbook has worked well for several years. Until recently when it began throwing errors.

The problem

The first time I would run it on a host, it would work as it should. But, if after a few weeks of testing and research on that host, I would run the playbook again to restore the configuration items in the playbook to their desired state, it would fail with this error.

TASK [Install command line tools] ****************************************************
fatal: [testvm1]: FAILED! => {"changed": false, "failures": ["Packages for argument 'task' available, but not installed."], "msg": "Failed to install some of the specified packages", "rc": 1}

I inspected the failing task, reproduced here in part, but could find no problems with the structure or syntax.

# Install command line tools
- name: Install command line tools
  dnf:
    name:
      - apcupsd
      - atop
      - bc
      - clamav
      - ddrescue
<SNIP>
      - whois
    state: latest

I tried a number of different things to resolve the problem and none of them worked. Thinking that one of the packages was causing the problem,I even tried installing all of the packages using DNF on the command line. They all installed correctly with no problems.

I’d been searching the Internet for clues and found one indication that this problem is due to an issue between DNF5 and the fact that the python3-libdnf5 package should be installed by default, but was not on the new installations. Installing that did not resolve the problem.

Further research led me to a new (to me) tool.

ansible-lint

The ansible-lint tool, like most lint-style tools, is designed to promote best practices in syntax, tool usage, and structure. I installed it easily as it it’s in the Fedora repo.

# dnf -y install python3-ansible-lint

I immediately ran this tool against my playbook and it reported over 200 errors. Some were simple format errors that didn’t seem to effect the results, but were egregious offenses against the strict YAML formatting, such as improper indentation, trailing spaces, and more. Other failures turned out to be the ones causing my problem.

I’ve only included a small bit of the output from this command here, but it gives you a good idea of what you’ll see in the output data stream. I used the –nocolor option to eliminate all the escape characters that contaminate the data, and the -p option to make the data stream easier to read. I’m using the original version of my file here with all its errors. The revised version doesn’t have the “X-” prepending it.

$ ansible-lint --nocolor -p X-PostInstall.yml 2&> ansible.log
<SNIP>
X-PostInstall.yml:150: fqcn[action-core]: Use FQCN for builtin module actions (copy).
X-PostInstall.yml:154: yaml[octal-values]: Forbidden implicit octal value "0774"
X-PostInstall.yml:158: yaml[indentation]: Wrong indentation: expected 8 but found 10
X-PostInstall.yml:168: fqcn[action-core]: Use FQCN for builtin module actions (systemd).
X-PostInstall.yml:172: yaml[trailing-spaces]: Trailing spaces
X-PostInstall.yml:172: yaml[truthy]: Truthy value should be one of
X-PostInstall.yml:173: yaml[truthy]: Truthy value should be one of
X-PostInstall.yml:175: fqcn[action-core]: Use FQCN for builtin module actions (systemd).
X-PostInstall.yml:179: yaml[trailing-spaces]: Trailing spaces
X-PostInstall.yml:179: yaml[truthy]: Truthy value should be one of
X-PostInstall.yml:180: yaml[truthy]: Truthy value should be one of
X-PostInstall.yml:183: fqcn[action-core]: Use FQCN for builtin module actions (systemd).
X-PostInstall.yml:187: yaml[truthy]: Truthy value should be one of
X-PostInstall.yml:188: yaml[truthy]: Truthy value should be one of
X-PostInstall.yml:203: fqcn[action-core]: Use FQCN for builtin module actions (copy).
X-PostInstall.yml:203: risky-octal: `mode: 644` should have a string value with leading zero `mode: "01204"` or use symbolic mode.
X-PostInstall.yml:207: yaml[colons]: Too many spaces after colon
X-PostInstall.yml:207: yaml[trailing-spaces]: Trailing spaces
X-PostInstall.yml:212: fqcn[action-core]: Use FQCN for builtin module actions (command).
X-PostInstall.yml:212: no-changed-when: Commands should not change things if nothing needs doing.
X-PostInstall.yml:217: fqcn[action-core]: Use FQCN for builtin module actions (systemd).
X-PostInstall.yml:221: yaml[truthy]: Truthy value should be one of
X-PostInstall.yml:228: fqcn[action-core]: Use FQCN for builtin module actions (systemd).
X-PostInstall.yml:232: yaml[truthy]: Truthy value should be one of
X-PostInstall.yml:233: yaml[truthy]: Truthy value should be one of
X-PostInstall.yml:235: fqcn[action-core]: Use FQCN for builtin module actions (systemd).
X-PostInstall.yml:239: yaml[truthy]: Truthy value should be one of
X-PostInstall.yml:240: yaml[truthy]: Truthy value should be one of
X-PostInstall.yml:249: fqcn[action-core]: Use FQCN for builtin module actions (copy).
X-PostInstall.yml:249: risky-octal: `mode: 644` should have a string value with leading zero `mode: "01204"` or use symbolic mode.
X-PostInstall.yml:253: yaml[colons]: Too many spaces after colon
X-PostInstall.yml:257: fqcn[action-core]: Use FQCN for builtin module actions (copy).
X-PostInstall.yml:257: risky-octal: `mode: 754` should have a string value with leading zero `mode: "01362"` or use symbolic mode.
X-PostInstall.yml:261: yaml[colons]: Too many spaces after colon
<SNIP>
X-PostInstall.yml:1066: fqcn[action-core]: Use FQCN for builtin module actions (reboot).
X-PostInstall.yml:1066: yaml[trailing-spaces]: Trailing spaces
Read documentation for instructions on how to ignore specific rule violations.

Rule Violation Summary
count tag                     profile    rule associated tags
    1 load-failure[not-found] min        core, unskippable
    1 no-free-form            basic      syntax, risk
    1 name[missing]           basic      idiom
    5 var-naming[pattern]     basic      idiom
    6 yaml[colons]            basic      formatting, yaml
    1 yaml[comments]          basic      formatting, yaml
    7 yaml[indentation]       basic      formatting, yaml
    1 yaml[octal-values]      basic      formatting, yaml
   39 yaml[trailing-spaces]   basic      formatting, yaml
   28 yaml[truthy]            basic      formatting, yaml
   12 name[casing]            moderate   idiom
   16 package-latest          safety     idempotency
    5 risky-octal             safety     formatting
    7 no-changed-when         shared     command-shell, idempotency
   67 fqcn[action-core]       production formatting

Failed: 217 failure(s), 0 warning(s) on 2 files.
                                              

The acronym FQCN stands for “fully qualified command name.” These messages are telling us to use the long Ansible commands such as ansible.builtin.copy instead of just copy. Without the -p option, we get more explicit descriptions that even indicate what we need to do to fix the error. The line numbers, 180, 183, and 187, allowed me to zero in on the errors very quickly.

yaml[truthy]: Truthy value should be one of [false, true]
X-PostInstall.yml:180

fqcn[action-core]: Use FQCN for builtin module actions (systemd).
X-PostInstall.yml:183 Use `ansible.builtin.systemd` or `ansible.legacy.systemd` instead.

yaml[truthy]: Truthy value should be one of [false, true]
X-PostInstall.yml:187

The “Truthy” errors are due to my use of “yes” or “no” instead of “true” or “false” in some of the logic variables and comparison tests. Because there are a lot of similar errors of each of the different types listed, I can use Vim or sed regular expressions to fix all like errors at once rather than dealing with them one at a time.

The error that was causing the problem I encountered was when I used state: latest when installing new packages. I switched to state: present, added skip_broken: true to each task, and used the FQCN, ansible.builtin.dnf:. It now works perfectly both for new installs and old installs that needed a reset to the beginning state. The task now looks like this..

# Install command line tools
- name: Install command line tools
  ansible.builtin.dnf:
    name:
      - apcupsd
      - atop
      - bc
      - clamav
      - ddrescue
<SNIP>
      - whois
    state: present
    skip_broken: true

Final comments

After fixing the problem — and many of the less critical errors — there were still many errors left in my playbook. I’m still working my way through all of them to ensure they’re all fixed and can’t cause future issues.

The ansible-lint tool is easy to use and can find those pesky issues that can’t be easily seen by the mark I Eyeball. It also helped me learn to code my playbooks using syntax like FQCN for commands rather than short names, and using the present state rather than latest. It’s those cool little things that I needed to know that helped the most.

Be sure to read the ansible-lint documentation becuase it has some interesting options including one that can allegedly fix many of the errors it encounters. I say “allegedly” because I haven’t yet tried this option.


Resources

I wrote the following articles here on Both.org to help you get started with Ansible if you haven’t already.

  1. Ansible #1: My first day using Ansible
  2. Ansible #2 How to create an Ansible Playbook
  3. Ansible #3: Finishing our Ansible playbook to manage workstation and server updates

Here’s some articles to get you started with regular expressions (REGEX).

  1. Regular Expressions #1: Introduction
  2. Regular Expressions #2: An example
  3. Regular Expressions #3: grep — Data flow and building blocks
  4. Regular Expressions #4: Pulling it all together

Leave a Reply