Time-stamped Directory Name

One of my co-workers wrote an Ansible playbook that gathered and processed data from a number of nodes in our lab. There could be as many as 250 nodes in play. Here’s a high-level overview of the steps the playbook took:

  • Created a local temporary directory via local_action
  • Wrote intermediate files for each node to the local temp directory
  • Read collected intermediate files from the local temp directory
  • Deleted the local temp directory

Do you see the mistake? By default, Ansible will attempt to parallelize the operation across as many nodes as possible. The first one that finishes will–you guessed it–delete the temporary directory. Oops.

Initial testing was done against a single node. When I added a second, BOOM. After looking at it, I decided that I had the following viable options to fix:

  1. Use serial: 1 in the playbook to prevent concurrent execution. This is undesirable because it would make running against 250 nodes *much* longer.
  2. Restructure the playbooks such that temp directory creation and deletion took place outside of the data gathering. This would have been a lot of work *and* introduced dependencies between playbooks that I don’t like. Using the same temp directory name in more than one playbook is one example of such coupling.
  3. Use a unique temp directory for each node.

Not very elegant, but the last option listed, above, was simple and practical. A quick search yielded a code snippet similar to the following:

- name: Create a temporary directory name using timestamp
  set_fact:
    tmp_scripts_dir: > 
      "{{ playbook_dir }}/scripts/
       {{ lookup('pipe', 'date +%Y%m%d%H%M%S.%5N') }}/tmp"

This creates a temp directory name that includes a timestamp down to nanoseconds–fine enough detail to differentiate between multiple nodes that are kicked off within the same second. I then used tmp_scripts_dir to satisfy the process steps.