Ansible map() not available on el6

If you are running Ansible playbooks on an el6 machine and you run across an error like this:

2017-05-24 10:21:47,585 p=8525 u=root |  fatal: [localhost]: FAILED! =>
{"failed": true, "msg": "The conditional check '( myvsds is defined and 
( myvsds | map(attribute='target_server_type') | list | issuperset([\"kvm\"]) 
or myvsds | map(attribute='target_server_type') | list | issuperset([\"heat\"])
) ) or ( myvcins is defined and ( myvcins | map(attribute='target_server_type')
| list | issuperset([\"kvm\"]) or myvcins | map(attribute='target_server_type')
| list | issuperset([\"heat\"]) ) )' failed. The error was: template
error while templating string: no filter named 'map'. String: {% if (
myvsds is defined and ( myvsds | map(attribute='target_server_type') | list |
issuperset([\"kvm\"]) or myvsds | map(attribute='target_server_type') | list |
issuperset([\"heat\"]) ) ) or ( myvcins is defined and ( myvcins |
map(attribute='target_server_type') | list | issuperset([\"kvm\"]) or myvcins |
map(attribute='target_server_type') | list | issuperset([\"heat\"]) ) ) %} True
{% else %} False {% endif %}\n\nThe error appears to have been in 
'/metro-2.1.1/roles/build/tasks/get_paths.yml': line 8, column 7, but may\nbe
elsewhere in the file depending on the exact syntax problem.\n\nThe offending
line appears to be:\n\n  - block: # QCOW2\n    - name: Find name of VSD QCOW2
File\n      ^ here\n"}

Note the text in BOLD. The problem is caused by the fact that Ansible as of version 2.1 depends on the map() filter implementation from the package python-jinja2. map() was introduced into python-jinja2 starting with python-jinja2 version 2.7. The base python-jinja2 version for el6 is 2.2, thus creating the error, above.

This means that Ansible using map() must be running el7 on the Ansible host.

Ansible [WARNING]: The loop variable ‘item’ is already in use.

I made a simple change to an existing Ansible playbook. I used the include_role command to invoke another role. Since I was calling the role on a list of hosts that I had dynamically discovered at runtime, I used with_items to make the call iterate over the list.

Not good. I saw the following warning and error:

[WARNING]: The loop variable 'item' is already in use. You should set
the `loop_var` value in the `loop_control` option for the task to 
something else to avoid variable collisions and unexpected behavior.

fatal: [localhost]: FAILED! => {
 "failed": true,
 "msg": "The conditional check ''u16' not in item|json_query('Name')
failed. The error was: error while evaluating conditional ('u16' not 
in item|json_query('Name')): 'item' is undefined... }

After a bit of searching and reading docs, I figured out how to fix. But the docs and examples were not straightforward. I hope you will find a better explanation herein.

First, Ansible (I’m using 2.2.1) doesn’t handle nested with_items loops properly. There’s something special about the way item is handled such that using item in nested loops causes one of the expected values to be overwritten.

My outer loop:

- name: Use ci-destroy to clean unused VMs
  include_role:
    name: ci-destroy
  with_items:
    - "{{ my_vm_list }}"

ci-destroy is a role that we use to garbage collect VMs from test failures in our CI environment. Before this task, the code gathers a list of the orphan VMs in the environment. The ci-destroy role is called on each one.

The ci-destroy role is my inner loop. It contains, among other things:

- name: Remove several entries from /etc/hosts file
  lineinfile:
    dest: /etc/hosts
    line: "{{ item }}"
    state: absent
  with_items: "{{ line_list }}"

With the outer loop and the inner loop using {{ item }}, Ansible had a problem. WARNING and the ERROR, as shown above.

The fix? Use loop_control to specify a non-default variable name for the inner item variable name. In my case:

- name: Remove several entries from /etc/hosts file
  lineinfile:
    dest: /etc/hosts
    line: "{{ line_item }}"
    state: absent
  with_items: "{{ line_list }}"
  loop_control:
    loop_var: line_item

The changed lines are shown in red. Basically, I changed the inner loop such that it used line_item instead of item. Worked like a charm.

Ansible dependencies via meta

I ran across an interesting feature of Ansible this week. A co-worker said that an upstream change to an open-source project he had been working on broke our installation code. The author had moved the invocation of a role from the main playbook for a role into roles/role_name/meta/main.yml. This broke the installation. Here’s why.

According to http://docs.ansible.com/ansible/playbooks_roles.html, dependencies listed in meta/main.yml are loaded and executed *before* the rest of the role. This is perfect for executing roles that are, in fact, dependencies. Dependencies get taken care of first. In my friend’s case, the upstream contributor didn’t understand that our role is *not* a dependency. When he moved the role invocation to meta/main.yml, he caused it to execute before its own dependencies had been satisfied. The fix was simple: Move our role back to the main playbook.

By the way, here’s what the dependencies look like in meta/main.yml:

---
dependencies:
  - { role: common, some_parameter: 3 }
  - { role: apache, apache_port: 80 }
  - { role: postgres, dbname: blarg, other_parameter: 12 }

The take-away is that using meta dependencies is yet another interesting way Ansible can be used to create clean playbooks that aren’t cluttered with dependencies.

Running Ansible as remote_user Requires Inventory

Maybe I just missed it. I was running a Jenkins job that triggered an Ansible role that pulled tar.gz files for several versions of my company’s software from a build server and deposited them in an NFS-shared directory on my Jenkins slave. The Jenkins slave was pulling dual duty as my local NFS server for nightly builds. Before the nightly builds ran, this Jenkins job would ensure that my NFS server had the proper files staged and ready to go. Sounds easy, right?

Nope. We kept having permissions issues. The Ansible role we created had two tasks:

  1. Clean out any unnecessary directories–for versions we were no longer supporting
  2. Create and populate directories for the versions we were supporting

We were experiencing permissions errors doing both tasks. I’ll save you the gory details, but we tried everything. We deleted everything. We used chmod, chown, and chgrp to set directory and file modes and ownership. We changed the Jenkins user. I tried running the playbook with become: yes. I tried sprinkling the tasks with remote_user: root. Nothing worked. I ran the job dozens of times, tweaking one thing at a time. Yuk.

Then I noticed something in the 4x verbose output of the job:

20:09:33  ESTABLISH SSH CONNECTION FOR USER: fred

I had set remote_user: root. Hmm. I checked another job that wasn’t having this problem. Sure enough, it was user root.

Here’s the difference: Playbook A, the one that was failing, didn’t use an inventory file because it was always executing on localhost. Playbook B, by contrast, used an inventory file. When I switched Playbook A to use an inventory file, everything worked. Bottom line: use an inventory file when you want to run as remote_user.

I suspect there may be a more elegant way to fix this, but in the fast-paced environment in which I work I am happy to have this solution.

Time-stamped Directory Name

One of my co-workers wrote an Ansible playbook that gathered and processed data from a number of nodes in our lab. There could be as many as 250 nodes in play. Here’s a high-level overview of the steps the playbook took:

  • Created a local temporary directory via local_action
  • Wrote intermediate files for each node to the local temp directory
  • Read collected intermediate files from the local temp directory
  • Deleted the local temp directory

Do you see the mistake? By default, Ansible will attempt to parallelize the operation across as many nodes as possible. The first one that finishes will–you guessed it–delete the temporary directory. Oops.

Initial testing was done against a single node. When I added a second, BOOM. After looking at it, I decided that I had the following viable options to fix:

  1. Use serial: 1 in the playbook to prevent concurrent execution. This is undesirable because it would make running against 250 nodes *much* longer.
  2. Restructure the playbooks such that temp directory creation and deletion took place outside of the data gathering. This would have been a lot of work *and* introduced dependencies between playbooks that I don’t like. Using the same temp directory name in more than one playbook is one example of such coupling.
  3. Use a unique temp directory for each node.

Not very elegant, but the last option listed, above, was simple and practical. A quick search yielded a code snippet similar to the following:

- name: Create a temporary directory name using timestamp
  set_fact:
    tmp_scripts_dir: > 
      "{{ playbook_dir }}/scripts/
       {{ lookup('pipe', 'date +%Y%m%d%H%M%S.%5N') }}/tmp"

This creates a temp directory name that includes a timestamp down to nanoseconds–fine enough detail to differentiate between multiple nodes that are kicked off within the same second. I then used tmp_scripts_dir to satisfy the process steps.

 

Custom Ansible filters: Easy solution to difficult problems

I have recently been using Ansible to automate health checks for some of our software-defined network (SDN) infrastructure. One of the devices my code must query for health is a soft router running the SROS operating system. Ansible 2.2 recently introduced support for an sros_command module (Info here) that simplifies my task somewhat, but I’m still left to do screen-scraping of the command output.

Screen scraping is nasty work. Lots of string processing with split(), strip(), and other commands. The resulting code is heavily dependent on the exact format of the command output. If it changes, the code breaks.

I initially implemented the screen-scraping using Jinja2 code in my playbooks. That put some pretty ugly, complex code right in the playbook. I found a better answer: Create a custom filter or two. Now things are *so much cleaner* in the playbooks themselves, the format-dependent code is now separated from the main code, and Python made it so much easier to code.

The best part: Ansible filters are very easy to create. The Ansible docs aren’t very helpful, perhaps because creation is so simple they thought it didn’t need explanation! The best way to figure out how to create your own filters is to look at some existing filters as a pattern to follow. The simplest of these is in Ansible itself, json_query. Here’s a stripped and simplified version of that code for the purpose of illustration. This code implements two trivial filters, my_to_upper and my_to_lower:

from ansible.errors import AnsibleError


def my_to_upper(string):
    ''' Given a string, return an all-uppercase version of the string.
    '''
    if string is None:
        raise AnsibleError('String not found')
    return string.upper()


def my_to_lower(string):
    ''' Given a string, return an all-lowercase version of the string.
    '''
    if string is None:
        raise AnsibleError('String not found')
    return string.lower()

class FilterModule(object):
    ''' Query filter '''

    def filters(self):
        return {
            'my_to_upper': my_to_upper,
            'my_to_lower': my_to_lower
    }

Developing this code is as simple as creating the FilterModule class, defining filters for each of the custom filters you need, and then providing a function for each filter. The example is trivial. I think you can see that you can make the filter functions as complex as required for your application.

Note that I have included AnsibleError in the example for illustration purposes because it is an extremely-useful way to get errors all the way to the console. If I were *really* implementing these filters, empty string wouldn’t be an error. I’d just return an empty string.

Here’s a couple of simple examples of how to call the filters and the resultant output:

- name: Create a mixed-case string
  shell: echo "A Mixed String"
  register: mixed_string
  delegate_to: localhost

- name: Print the UPPERCASE string
  debug: msg="{{ mixed_string.stdout|my_to_upper }}"

- name: Print the LOWERCASE string
  debug: msg="{{ mixed_string.stdout|my_to_lower }}"

<snip...>

TASK [my_task : Create a mixed-case string] *********************************
changed: [host.example.com -> localhost]

TASK [my_task : Print the UPPERCASE string] *********************************
ok: [host.example.com] => {
 "msg": "A MIXED STRING"
}

TASK [my_task : Print the LOWERCASE string] *********************************
ok: [host.example.com] => {
 "msg": "a mixed string"
}

In my case, instead of my_to_upper and my_to_lower, I created *command*_to_json filters that convert the SROS command output into JSON that is easily parsed in the playbook. This keeps my playbooks generic and isolates my filters as the place where the nasty code lives.

Verbose Output from git

Here’s a simple trick that provides more verbose text when using git:

GIT_CURL_VERBOSE=1 git clone https://github.com/repo/project.git

The

GIT_CURL_VERBOSE=1

is the key.

This change provided the difference I needed to debug.

Before:

Cloning into 'project'...
fatal: unable to access 'https://github.com/repo/project.git/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none

After:

Cloning into 'project'...
* Couldn't find host github.com in the .netrc file; using defaults
* Hostname was NOT found in DNS cache
* Trying 192.30.253.113...
* Connected to github.com (192.30.253.113) port 443 (#0)
* found 173 certificates in /etc/ssl/certs/ca-certificates.crt
* server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none
* Closing connection 0
fatal: unable to access 'https://github.com/repo/project.git/': server certificate verification failed. CAfile: /etc/ssl/certs/ca-certificates.crt CRLfile: none

Interesting!