Quickstart

Follow Installation to set up a project and install the DoE-Suite first.

The DoE-Suite provides a demo_project in the root of the repository that shows the required structure to integrate DoES into an existing project. After completing the Installation section, it should be possible to run the examples , i.e., under demo_project/doe-suite-config/designs, of the demo project.

Afterward, you can change the environment variable DOES_PROJECT_DIR to point to your own project (instead of the demo project) and continue from there as described in the Tutorial.

A Minimal Example

We start with the minimal example of a suite design:

doe-suite-config/designs/example01-minimal.yml
---

# The suite `example01-minimal` contains a single experiment called `minimal`.
# We run this experiment on a single instance `n=1` of host type `small` and we only use a single repetition.
# The experiment consists of four runs, i.e., configurations:
# - echo "hello world."
# - echo "hello world!"
# - echo "hello universe."
# - echo "hello universe!"
#
# For the experiment configuration, we use the `cross` format:
# The different levels for each factor are listed in `base_experiment` and
# we create the runs by taking a cross product of all factor levels.
# (e.g., [world, universe] x [".", "!"] results in 4 runs)

minimal: # experiment name
  n_repetitions: 1
  host_types:
    small: # one instance of type `small`
      n: 1
      $CMD$: "echo \"[% my_run.arg1 %] [% my_run.arg2 %][% my_run.arg3 %] \"" # command to start experiment run
  base_experiment:
    arg1: hello # fix parameter between runs (constant)
    arg2:
      $FACTOR$: [world, universe] # varied parameter between runs (factor)
    arg3:
      $FACTOR$: [".", "!"] # varied parameter between runs (factor)

$ETL$: # ensures that stderr.log is empty everywhere and that no files are generated except stdout.log
  check_error:
    experiments: "*"
    extractors: {ErrorExtractor: {}, IgnoreExtractor: {} }

What does this design do?

  1. It runs on a single instance of type small

  2. The design contains two $FACTOR$: arg2 and arg3, with two levels each. In total there are 2 x 2 = 4 runs, i.e., configurations:

    • echo "hello world."

    • echo "hello world!"

    • echo "hello universe."

    • echo "hello universe!"

Show Resulting Commands
$ make design suite=example01-minimal
Traceback (most recent call last):
  File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 123, in <module>
    suite_design, suite_design_ext = main(
  File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 42, in main
    prj_id = util.get_project_id()
  File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/util.py", line 24, in get_project_id
    raise ValueError("env variable:DOES_PROJECT_ID_SUFFIX not set")
ValueError: env variable:DOES_PROJECT_ID_SUFFIX not set
make: *** [Makefile:358: design] Error 1

Save it as example01-minimal.yml or something similar under doe-suite-config/designs. Afterwards, you can run the experiment suite with:

Start the experiment
make run suite=example01-minimal id=new cloud=aws

This will start the experiment suite on AWS. First, it creates a VPC and an EC2 instance corresponding to the host_type: small. The doe-suite-config/group_vars/small/main.yml file contains the configuration for the instance.

Show Source
doe-suite-config/group_vars/small/main.yml
---

# AWS EC2
instance_type: t2.medium
ec2_volume_size: 16
ec2_image_id: ami-08481eff064f39a84
ec2_volume_snapshot: snap-0b8d7894c93b6df7a

# ETH Euler
euler_job_minutes: 10
euler_cpu_cores: 1
euler_cpu_mem_per_core_mb: 3072
euler_gpu_number: 0
euler_gpu_min_mem_per_gpu_mb: 0
euler_gpu_model: ~
euler_env: "gcc/8.2.0 python/3.9.9"
euler_scratch_dir: "/cluster/scratch/{{ euler_user }}"

# Docker
docker_image_id: "doe-ubuntu20"
docker_image_tag: "latest"

After creating the instance, the DoE-Suite runs the four shell commands sequentially on the instance. Whenever, a command finishes, the resulting stdout and stderr together with potential result files are fetched and saved under doe-suite-results on your local machine.

Quick Command Reference

These are the most important commands to get started with the DoE-Suite.

Start a new experiment suite run
make run suite=example01-minimal id=new
Continue the last experiment suite
make run suite=example01-minimal id=last
Terminate all remote resources, e.g., terminate all EC2 instances, and local cleanup, e.g., pycache:
make clean

To get an overview of the functionality use make or make help:

make help
Running Experiments
  make run suite=<SUITE> id=new                       - run the experiments in the suite
  make run suite=<SUITE> id=<ID>                      - continue with the experiments in the suite with <ID> (often id=last)
  make run suite=<SUITE> id=<ID> cloud=<CLOUD>        - run suite on non-default cloud ([aws], euler)
  make run suite=<SUITE> id=<ID> expfilter=<REGEX>    - run only subset of experiments in suite where name matches the <REGEX> (suite must be valid)
  make run-keep suite=<SUITE> id=new                  - does not terminate instances at the end, otherwise works the same as run target
Clean
  make clean                                          - terminate running cloud instances belonging to the project and local cleanup
  make clean-result                                   - delete all inclomplete results in doe-suite-results
Running ETL Locally
  make etl suite=<SUITE> id=<ID>                      - run the etl pipeline of the suite (locally) to process results (often id=last)
  make etl-design suite=<SUITE> id=<ID>               - same as `make etl ...` but uses the pipeline from the suite design instead of results
  make etl-all                                        - run etl pipelines of all results
  make etl-super config=<CONFIG> out=<PATH>           - run the super etl to combine results of multiple suites  (for <CONFIG> e.g., demo_plots)
  make etl-super ... pipelines="<P1> <P2>"            - run only a subset of pipelines in the super etl
Clean ETL
  make etl-clean suite=<SUITE> id=<ID>                - delete etl results from specific suite (can be regenerated with make etl ...)
  make etl-clean-all                                  - delete etl results from all suites (can be regenerated with make etl-all)
Gather Information
  make info                                           - list available suite designs
  make status suite=<SUITE> id=<ID>                   - show the status of a specific suite run (often id=last)
Design of Experiment Suites
  make design suite=<SUITE>                           - list all the run commands defined by the suite
  make design-validate suite=<SUITE>                  - validate suite design and show with default values
Setting up a Suite
  make new                                            - initialize doe-suite-config from a template
Running Tests
  make test                                           - running all suites (seq) and comparing results to expected (on aws)
  make euler-test cloud=euler                         - running all single instance suites on euler and compare results to expected
  make etl-test-all                                   - re-run all etl pipelines and compare results to current state (useful after update of etl step)

Todo

We could potentially include here a bit more extensive example as in the comment below