Developer Ansible Guide

Warning

The documentation is not up to date.

Todo

TODO [nku] update documentation for developers -> can delete a lot and refer to the documentation within the role.

TODO [nku] mention the logic of the ec2 dynamic inventory with the prf_id and suite filter => can use all, exp_name, check_status_yes, check_status_no, is_controller_yes, is_controller_no

TODO [nku] remove mention of host_type all

This document contains some useful information on the internals of this Ansible project. E.g., how to update/extend certain parts.

Adding A Variable to the Experiment State

To add a variable to the experiment state, do: 1. Add it to the experiment-state template. 2. Add it to exp_facts in the role called “Set experiment variables (facts) based on loaded state” in the experiment-state tasks. 3. To more generally integrate it to the project, add it to the example designs and the expdesign script (the latter to translate experiment table specifications).

Important Data Structures

  • exp_facts: This data structure is a dictionary with the experiment name as key and information about the status of the experiment as values.

    Example:

    exp_facts: {
      'experiment_1': {
        'exp_id': '5',
        'exp_runs_ext': [...],
        'exp_job_ids': ['5_0_0', ...],
        'exp_job_ids_unfinished': ['5_0_0', ...],
        'exp_job_ids_pending': ['5_0_0', ...],
        'exp_job_ids_running': [],
        'n_repetitions': '2',
        'common_roles': ['setup-common']
      },
      'experiment_2': {...}
    }
    

    Naming:

    • exp_facts[exp_name] -> exp_fact

  • host_types: This data structure is a dictionary with the host type as a key. For each host type, it stores a dictionary with the experiment name as key and the host configuration (for this host type and experiment) as value. all is a special entry that collects the total number of instances for each host_type summed over all experiments (attention: its second key is the host type and not the experiment name!). The ... below are just an ellipsis to make the example more concise and do not represent actual values.

    Example:

    host_types: {
      'client': {
        'experiment_1': {
          'init_roles': 'setup-client',
          'n': 1,
          'check_status': false
        },
        'experiment_2': {
          'init_roles': 'setup-client',
          'n': 1,
          'check_status': false
        }
      },
      'server': {
        'experiment_1': {...},
        'experiment_2': {...}
      }
      'all': {
        'client': {
          'n': 2
        }
        'server': {
          'n': ...
        }
      }
    }
    

    Naming:

    • host_types[group] -> host_type

    • host_types[group][exp_name] -> host_facts

  • host_type_names: A list of all host types, without the special key all (i.e., not equal to host_types.keys()). This is convenient to loop through host types.

  • exp_hosts: variable set on “controller” hosts which contains a list of hosts involved in the current experiment.

  • host_group_name and host_group_name_long:

    • The first is the “short” group name used in the experiment suite config file, e.g. server.

    • The second is the group name used in the ansible inventory, defined by the EC2 plugin. It’s of the form tag_Name_<prj_id>_<host_group_name>_SEP_<exp_name>.

      • _SEP_ is a separator defined in group_vars/all in the variable separator. This is used to split host_group_name and exp_name when we recover those two variables from the group name.

  • suite_all: Group of EC2 hosts belonging to this project ID.

Roles

An (incomplete) list of roles and their purpose:

  • experiment-aws: General role to handle creating AWS EC2 instances.

  • experiment-aws-ec2-create: Create EC2 instances for a specific host type and experiment.

  • experiment-aws-ec2-manage: This is an optimization. experiment-aws first launches all instances using experiment-aws-ec2 and only then waits for SSH to come up. Thus, instances can boot up concurrently and the next one is not only started after the current one is completely initialized.

  • experiment-vpc*: Roles to create or remove a VPC (Virtual Private Cloud) for this project. We use a single VPC for the entire project.

  • experiment-clear: Role to terminate all running EC2 instances from the current project. It prints the instance IDs to remove and gives the operator 10s to double check those IDs. By pressing CTRL+C, the counter can be stopped. Pressing A next aborts and does not delete those instances. Pressing C instead continues the play.

  • experiment-job*: Roles to manage the initial setup, start, scheduling, and status checking of jobs. There is a job for every run and repetition.

  • experiment-load: This role initializes the jobs by loading them from the experiment config.

  • experiment-parse-config: Parse the experiment suite, check assertions, and set default values. The dict_default filter is in this folder.

  • experiment-set-vars: This role sets variables for a host. This is somewhat a hack and derives the experiment name and host group from the inventory group name. This workaround was necessary, since AFAIK the EC2 plugin doesn’t allow us to set host variables when launching new EC2 instances.

  • experiment-state: This role handels the storing and loading of the experiment state. There is one JSON file for every experiment in the current suite.

  • setup-*: Example roles that show how (individual or all) hosts can be set up.

Custom JINJA2 Filters

  • dict_default: Set a default value for a dictionary at the specified query.

    The query supports the wildcard character ’*’ and expects dot notation (i.e., d.plants and not d[‘plants’]). The wildcard tolerates lists.

    Example usage:

    Data:

    data
      {
          "animals": {
              "cats": 10,
              "dogs": 1
          }
          "plants": {
              "bushes": 2,
              "pot plants": 3
          }
      }
    

    Examples:

    • Set a default for the plants “cacti”: {{ data | dict_default("plants", "cacti", 0) }}

      Results in the new dictionary:

      data
          {
              "animals": {
                  "cats": 10,
                  "dogs": 1
              }
              "plants": {
                  "bushes": 2,
                  "pot plants": 3,
                  "cacti": 0
              }
          }
      
    • Set a default category “other” for all entries if its not present: {{ data | dict_default("*", "other", 0) }}

      Results in the new dictionary:

      data
          {
              "animals": {
                  "cats": 10,
                  "dogs": 1,
                  "other": 0
              }
              "plants": {
                  "bushes": 2,
                  "pot plants": 3,
                  "other": 0
              }
          }
      

    Remarks:

    • Note that this filter can only add key/value pairs to an existing dictionaries:

      • WRONG: {{ data | dict_default("plants.house", "cacti", 0) }} because data["plants"] does not contain a dictionary for key house.

      • CAREFUL: {{ data | dict_default("plants", "house.cacti", 0) }} adds the entry house.cacti: 0. It does not add a dictionary under key house with the entry cacti: 0

Job scheduling

There is a group of hosts for every experiment. Those are created dynamically by adding the tag tag:Exp with the experiment name as value to every EC2 instance.

The first host in the group is the “controller”, which is used to monitor the job status of the experiment and to advance to the next job when the current one is done. At every job-switch, the controller pulls all results and stores them on the localhost. It also updates the state YAML files and distributes the updated exp_facts variable to all hosts involved in this experiment (exp_hosts and localhost).

Inventory Host Groups

Important: do not use all. The EC2 plugin adds other EC2 instances to the inventory even if they do not belong to the current project. When we use all, Ansible tries to connect to them, which can cause SSH connection errors (if no public key for those hosts is defined) or even unintended configration overwriting of other instances!

Use suite_all instead to specify a play for all EC2 instances belonging to the current project.