Suite Design By Example¶
In this section we show a series of examples that demonstrate the features for suite designs. The examples start with the minimal suite design and then become more complex.
More information, about the design format and defaults, can be found in Experiment Design.
example01-minimal¶
The suite=example01-minimal
shows the minimal suite design with a single experiment on a single host and no ETL pipeline.
---
# The suite `example01-minimal` contains a single experiment called `minimal`.
# We run this experiment on a single instance `n=1` of host type `small` and we only use a single repetition.
# The experiment consists of four runs, i.e., configurations:
# - echo "hello world."
# - echo "hello world!"
# - echo "hello universe."
# - echo "hello universe!"
#
# For the experiment configuration, we use the `cross` format:
# The different levels for each factor are listed in `base_experiment` and
# we create the runs by taking a cross product of all factor levels.
# (e.g., [world, universe] x [".", "!"] results in 4 runs)
minimal: # experiment name
n_repetitions: 1
host_types:
small: # one instance of type `small`
n: 1
$CMD$: "echo \"[% my_run.arg1 %] [% my_run.arg2 %][% my_run.arg3 %] \"" # command to start experiment run
base_experiment:
arg1: hello # fix parameter between runs (constant)
arg2:
$FACTOR$: [world, universe] # varied parameter between runs (factor)
arg3:
$FACTOR$: [".", "!"] # varied parameter between runs (factor)
$ETL$: # ensures that stderr.log is empty everywhere and that no files are generated except stdout.log
check_error:
experiments: "*"
extractors: {ErrorExtractor: {}, IgnoreExtractor: {} }
Show Resulting Commands
$ make design suite=example01-minimal
Traceback (most recent call last):
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 123, in <module>
suite_design, suite_design_ext = main(
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 42, in main
prj_id = util.get_project_id()
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/util.py", line 24, in get_project_id
raise ValueError("env variable:DOES_PROJECT_ID_SUFFIX not set")
ValueError: env variable:DOES_PROJECT_ID_SUFFIX not set
make: *** [Makefile:358: design] Error 1
example02-single¶
The suite=example02-single
demonstrates that:
a suite can consist of multiple experiments that are executed in parallel (here 2 experiments)
init_roles
in host_type defines an ansible role to install packages etc. on a hosta
config.json
file with the run config is within the working directory of the$CMD$
-> can be used to pass config parameterswe can repeat each run configuration multiple times (
n_repetitions
).an
$ETL$
pipeline can automatically process result files (e.g., extract from result structure, transform into a suited df, load a summary)
---
# Example Goal - Demonstrate that:
# (1) a suite can consist of multiple experiments that are executed in parallel (here 2 experiments)
# (2) `init_roles` in host_type defines an ansible role to install packages etc. on a host
# (3) a `config.json` file with the run config is within the working directory of the `$CMD$`
# -> can be used to pass config parameters
# (4) we can repeat each run configuration multiple times (`n_repetitions`).
# (5) an `$ETL$` pipeline can automatically process result files (e.g., extract from result structure, transform into a suited df, load a summary)
# (6) we can use the Python range syntax do define $FACTORS$ via jinja
# (7) some variables can only be known at runtime (e.g., cloud dependent), for these we can use the `at_runtime` filter to read the variable out of `exp_host_lst`
# experiment with two factors: one with 3 levels and one with 2 levels (cross-product). We repeat each run config twice.
experiment_1:
n_repetitions: 2 # (4) each of the 6 run configurations is repeated 2x -> 12 jobs
host_types:
small:
n: 1
init_roles: setup-small
# runs the python script demo_latency.py within the code directory fetched from the code repository: the demo script creates a result file in csv or json
# (3) passes arguments on the command line
$CMD$: "[% my_run.python_run %]/demo_project/demo_latency.py --opt [% my_run.opt %] --size [% my_run.payload_size_mb %] --out [% my_run.out %]"
base_experiment:
# (7) Uses the `| at runtime(exp_host_lst)` to access a hostvar (info: at_runtime w/o additional arguments requires that the variable has the same value across all host types)
python_run: "[% 'demo_project_python' | at_runtime(exp_host_lst) %] [% 'exp_code_dir' | at_runtime(exp_host_lst) %]"
out: json
payload_size_mb:
$FACTOR$: [10, 20, 30]
opt:
$FACTOR$: [True, False]
# experiment with one factor, with 3 levels, and three repetitions (each)
experiment_2:
n_repetitions: 3 # (4) each of the 3 run configurations is repeated 3x -> 9 jobs
host_types:
small:
n: 1
init_roles: setup-small
$CMD$: "[% my_run.python_run %]/demo_project/demo.py --config config.json" # (3) uses the config.json in the working dir of the command with the run config
base_experiment:
python_run: "[% 'demo_project_python' | at_runtime(exp_host_lst) %] [% 'exp_code_dir' | at_runtime(exp_host_lst) %]"
out: csv
problem:
opt: False
# (6) Python Range syntax to define factors via jinja (also possible for non-factors)
size:
$FACTOR$: {{ range(10, 25, 5) | list }} # [10, 15, 20]
# other range syntax: {{ range(3) | list }} -> [0, 1, 2] | {{ range(1,3) | list }} -> [1,2] | {{ range(20, 40, 10) | list }} -> [20, 30]
other: "{{ range(2) | list }}"
# (5) the suite has two etl pipelines to process results
# - each pipeline starts with a set of extractors, each produced result file is assigned to exactly one extractor (using a `file_regex`)
# (e.g., the JsonExtractor default regex matches all files ending with `.json`)
# - we combine the results from different extractors into a dataframe and transform it with a chain of transformers.
# (e.g., the RepAggTransformer aggregates over all repetitions of an experiment run and calculates `mean`, `std` etc.)
# - finally, all loaders are executed on the dataframe resulting from the chain of transformers
$ETL$:
pipeline1:
experiments: [experiment_1]
extractors:
JsonExtractor: {} # with default file_regex
ErrorExtractor: {} # if a non-empty file exists matching the default regex -> then we throw an error using the ErrorExtractor
IgnoreExtractor: {} # since we want that each file is processed by an extractor, we provide the IgnoreExtractor which can be used to ignore certain files. (e.g., stdout)
transformers:
- name: RepAggTransformer # aggregate over all repetitions of a run and calc `mean`, `std`, etc.
data_columns: [latency] # the names of the columns in the dataframe that contain the measurements
loaders:
CsvSummaryLoader: {skip_empty: True} # write the transformed detl_info["suite_dir"]ataframe across the whole experiment as a csv file
DemoLatencyPlotLoader: {} # create a plot based on project-specific plot loader
pipeline2:
experiments: [experiment_2]
extractors: # with overwritten default file_regex
CsvExtractor:
file_regex: '.*\.csv$'
ErrorExtractor:
file_regex: '^stderr.log$'
IgnoreExtractor:
file_regex: '^stdout.log$'
transformers:
# sort values by [run, rep] to ensure the comparison of the test matches
- df.sort_values: { by: [run, rep], ignore_index: yes }
loaders:
CsvSummaryLoader: {skip_empty: True}
Show Resulting Commands
$ make design suite=example02-single
Traceback (most recent call last):
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 123, in <module>
suite_design, suite_design_ext = main(
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 42, in main
prj_id = util.get_project_id()
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/util.py", line 24, in get_project_id
raise ValueError("env variable:DOES_PROJECT_ID_SUFFIX not set")
ValueError: env variable:DOES_PROJECT_ID_SUFFIX not set
make: *** [Makefile:358: design] Error 1
example03-format¶
The suite=example03-format
demonstrates the use of the two (three) formats for expressing factors (varying parameters).
---
# Example Goal - Demonstrate the use of the two (three) formats for expressing factors (varying parameters).
#
# The 2(+1) different formats:
# - the `cross` format is the concise form for a cross product of all factors
# - the `level-list` format allows to specify a list with a concrete level for each factor (i.e., not full cross-product)
# - a mix between `cross` and `level-list` format that combines the advantages of both formats.
#
# The `cross` format uses the keyword `$FACTOR$` as a YAML key,
# while the `factor list` uses `$FACTOR$` as a YAML value and expects a corresponding level in the `factor_levels` list.
#
# The `except_filters` construct can be used to ignore specific combinations of configuration (e.g., the cross product between two factors except a specific combination should be skipped)
# experiment in the pure `cross` format
format_cross:
n_repetitions: 1 # no repetition
host_types:
small:
n: 1
$CMD$: "echo \"run app=[% my_run.app.name %] with vec=[% my_run.vector_size %] seed=[% my_run.seed %]\"" # use echo as example
base_experiment:
seed: 1234 # constant
vector_size:
$FACTOR$: [10, 20, 30, 40] # varied parameter between runs (factor)
app:
name:
$FACTOR$: [app1, app2, app3] # varied parameter between runs (factor)
# hyperparam: X -> not used in this experiment
except_filters:
# we ignore the combination of vector_size 40 with app2 and app3 and only run it with app1
- vector_size: 40
app:
name: app2
- vector_size: 40
app:
name: app3
#
# The experiment `format_cross` results in 10 runs:
# - {"vector_size": 10, "app.name": app1, "seed": 1234}
# - {"vector_size": 10, "app.name": app2, "seed": 1234}
# - {"vector_size": 10, "app.name": app3, "seed": 1234}
# - {"vector_size": 20, "app.name": app1, "seed": 1234}
# - {"vector_size": 20, "app.name": app2, "seed": 1234}
# - {"vector_size": 20, "app.name": app3, "seed": 1234}
# - {"vector_size": 30, "app.name": app1, "seed": 1234}
# - {"vector_size": 30, "app.name": app2, "seed": 1234}
# - {"vector_size": 30, "app.name": app3, "seed": 1234}
# - {"vector_size": 40, "app.name": app1, "seed": 1234}
# - {"vector_size": 40, "app.name": app2, "seed": 1234} -> Ignored by except_filters
# - {"vector_size": 40, "app.name": app3, "seed": 1234} -> Ignored by except_filters
# experiment in the pure `level list` format
format_levellist:
n_repetitions: 1 # no repetition
host_types:
small: # use one instance
n: 1
$CMD$: "echo \"run app=[% my_run.app.name %] with hyperparam=[% my_run.app.hyperparam %] seed=[% my_run.seed %]\"" # use echo as example
base_experiment:
seed: 1234
# vector_size: X .> not used in this experiment
app:
name: $FACTOR$ # varied parameter between runs (factor)
hyperparam: $FACTOR$ # varied parameter between runs (factor)
factor_levels:
- app:
name: app1
hyperparam: 0.1
- app:
name: app2
hyperparam: 10
- app:
name: app3
hyperparam: 5
# The `level-list`format has the advantage that we don't need to create a run for the full cross product of factors.
# (e.g., here each app has a specific hyperparam, hence we don't want to run the full cross product because the hyperparam is app specific)
#
# The experiment `format_levellist` results in 3 runs:
# - {"app.name": app1, "app.hyperparam": 0.1, "seed": 1234}
# - {"app.name": app2, "app.hyperparam": 10 , "seed": 1234}
# - {"app.name": app3, "app.hyperparam": 5 , "seed": 1234}
# experiment in a `mixed` format of `cross`and `level list`format
format_mixed:
n_repetitions: 1 # no repetition
host_types:
small:
n: 1
$CMD$: "echo \"run app=[% my_run.app.name %] with hyperparam=[% my_run.app.hyperparam %] vec=[% my_run.vector_size %] seed=[% my_run.seed %]\"" # use echo as example
base_experiment:
seed: 1234 # constant
vector_size:
$FACTOR$: [10, 20, 30, 40] # varied parameter between runs (factor)
app:
name: $FACTOR$ # varied parameter between runs (factor)
hyperparam: $FACTOR$ # varied parameter between runs (factor)
factor_levels:
- app:
name: app1
hyperparam: 0.1
- app:
name: app2
hyperparam: 10
- app:
name: app3
hyperparam: 5
except_filters:
# we ignore the combination of vector_size 40 with app2 and app3 and only run it with app1
- vector_size: 40
app:
name: app2
- vector_size: 40
app:
name: app3
# The mix between `cross`and `level-list` is the most flexible because it allows to define $FACTORS$
# for which we want to create the cross product (e.g., `vector_size`) and
# other factors to create the level list (e.g., app.name, app.hyperparam).
#
# In this example, the `hyperparam` is app specific and hence it does not make sense to create a
# cross product between app name and hyperparam. In the `cross`format this is not possible,
# while in the `level-list` format we would have to list all 12 runs under `factor_levels`.
#
# The experiment `format_mixed` results in 12 runs:
# - {"vector_size": 10, "app.name": app1, "app.hyperparam": 0.1, "seed": 1234}
# - {"vector_size": 10, "app.name": app2, "app.hyperparam": 10 , "seed": 1234}
# - {"vector_size": 10, "app.name": app3, "app.hyperparam": 5 , "seed": 1234}
# - {"vector_size": 20, "app.name": app1, "app.hyperparam": 0.1, "seed": 1234}
# - {"vector_size": 20, "app.name": app2, "app.hyperparam": 10 , "seed": 1234}
# - {"vector_size": 20, "app.name": app3, "app.hyperparam": 5 , "seed": 1234}
# - {"vector_size": 30, "app.name": app1, "app.hyperparam": 0.1, "seed": 1234}
# - {"vector_size": 30, "app.name": app2, "app.hyperparam": 10 , "seed": 1234}
# - {"vector_size": 30, "app.name": app3, "app.hyperparam": 5 , "seed": 1234}
# - {"vector_size": 40, "app.name": app1, "app.hyperparam": 0.1, "seed": 1234}
# - {"vector_size": 40, "app.name": app2, "app.hyperparam": 10 , "seed": 1234} -> ignored by except_filters
# - {"vector_size": 40, "app.name": app3, "app.hyperparam": 5 , "seed": 1234} -> ignored by except_filters
$ETL$:
check_error: # ensures that stderr.log is empty everywhere and that no files are generated except stdout.log
experiments: "*"
extractors: {ErrorExtractor: {}, IgnoreExtractor: {} }
Show Resulting Commands
$ make design suite=example03-format
Traceback (most recent call last):
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 123, in <module>
suite_design, suite_design_ext = main(
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 42, in main
prj_id = util.get_project_id()
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/util.py", line 24, in get_project_id
raise ValueError("env variable:DOES_PROJECT_ID_SUFFIX not set")
ValueError: env variable:DOES_PROJECT_ID_SUFFIX not set
make: *** [Makefile:358: design] Error 1
example04-multi¶
The suite=example04-multi
demonstrates:
an experiment involving multiple instances (e.g., client-server)
that
common_roles
lists roles executed on all host_types, whileinit_roles
is a host_type specific role.the use of the variable
exp_host_lst
in to get the dns name of of other instances (e.g., get dns name of server)the use of
check_status
, to control when an experiment job is considered to be over. If set toTrue
, then the experiment job waits until the stops. default(True)
---
# Example Goal - Demonstrate:
# (1) an experiment involving multiple instances (e.g., client-server)
# (2) that `common_roles` lists roles executed on all host_types, while `init_roles` is a host_type specific role.
# (3) the use of the variable `exp_host_lst` in $CMD$ to get the dns name of of other instances (e.g., get dns name of server)
# (4) the use of `check_status`, to control when an experiment job is considered to be over.
# If set to `True`, then the experiment job waits until the $CMD$ stops. default(True)
# client server message where each client sends the server a message
exp_client_server:
n_repetitions: 3
common_roles:
- setup-common # (2) role executed on all host types
host_types:
client: # (1)
n: 2
check_status: True # (4) when each client, sent a message, then the experiment job is complete
init_roles: setup-client # (2) role executed only on client
$CMD$:
# send messages to the server with nc
# (3) use exp_host_lst variable to extract ip address from server, the delay `sleep 5` ensures that the server is running when the client sends the message
# to_private_dns_name is a custom filter function defined in designs/filter_plugins
- sleep 5 && echo '[% my_run.host_vars.client.msg %] from client 1 ([% my_run.info %])' | netcat -q 1 [% exp_host_lst | to_private_dns_name('server') | default('<UNDEFINED-DNS>') %] [% my_run.port %]
# have two commands to distinguish messages from client 1 and 2
- sleep 5 && echo '[% my_run.host_vars.client.msg %] from client 2 ([% my_run.info %])' | netcat -q 1 [% exp_host_lst | to_private_dns_name('server') | default('<UNDEFINED-DNS>') %] [% my_run.port %]
server: # (1)
n: 1
check_status: False # (4) the server does not stop (does not know when the experiment job is over)
init_roles: setup-server # (2) role executed only on server
# run a single ncat server -> writes all incoming messages to stdout
$CMD$: ncat -l [% my_run.port %] --keep-open
base_experiment:
port: 2807
info: $FACTOR$
host_vars:
client:
msg: $FACTOR$
server:
greeting: ignore
factor_levels:
- info: run 0
host_vars:
client:
msg: hello server
- info: run 1
host_vars:
client:
msg: hi server
- info: run 2
host_vars:
client:
msg: good day server
$ETL$:
check_error: # ensures that stderr.log is empty everywhere and that no files are generated except stdout.log
experiments: "*"
extractors: {ErrorExtractor: {}, IgnoreExtractor: {} }
Show Resulting Commands
$ make design suite=example04-multi
Traceback (most recent call last):
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 123, in <module>
suite_design, suite_design_ext = main(
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 42, in main
prj_id = util.get_project_id()
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/util.py", line 24, in get_project_id
raise ValueError("env variable:DOES_PROJECT_ID_SUFFIX not set")
ValueError: env variable:DOES_PROJECT_ID_SUFFIX not set
make: *** [Makefile:358: design] Error 1
example05-complex¶
The suite=example05-complex
shows complex experiments with:
a mix of formats
multiple experiments
running different commands on different instances
---
# Goal: Show complex experiments: mix of formats, multiple experiments, running different commands on different instances
# Example Goal - Show a more complex set of experiments that demonstrates:
# (1) a mix of single instance experiments and multi instance experiments
# (2) a mix of design formats (`cross`and `level-list`)
# (3) running different commands on different instances of the same host_type
# (4) running a main command and one or more background commands on the same host
# (5) write a results file (csv)
# Suite with 3 experiments:
# - exp1: experiment on single instance
# - exp2: experiment on multiple instances (client server)
# - exp3: experiment on multiple instances
# experiment on a single instance that writes an output file
exp_single:
n_repetitions: 2
common_roles:
- setup-common
host_types:
small:
n: 1
init_roles: setup-small
$CMD$: "echo \"[% my_run.info %];[% my_run.seed %];[% my_run.opt%]\" >> results/out.csv" # (4) write to output file
base_experiment:
info: $FACTOR$
seed: 1234
opt: $FACTOR$
factor_levels:
- info: run1 - with optimization
opt: True
- info: run2 - without optimization
opt: False
# client server experiment (clients send msg to server)
exp_multi_1:
n_repetitions: 3
common_roles:
- setup-common
host_types:
client:
n: 3
$CMD$: # (3) run a different command on each client instance -> at the moment can only run one command (main) per instance, later support will be added for multiple commands per instance (e.g., run some monitoring service)
# send messages to the server with nc
# use exp_host_lst variable to extract ip address from server, the delay `sleep 5` ensures that the server is running when the client sends the message
- main: sleep 5 && echo '[% my_run.host_vars.client.msg %] from client 1' | netcat -q 1 [% my_run.server_dns %] [% my_run.port %]
# have three commands to distinguish messages from client 1, 2, and 3
- main: sleep 5 && echo '[% my_run.host_vars.client.msg %] from client 2' | netcat -q 1 [% my_run.server_dns %] [% my_run.port %]
aux1: echo 'aux1 start' && sleep 3 && echo 'aux1 end' # (4) Auxiliary cmds run on client 2
aux2: echo 'aux2 start' && sleep 50 && cat nonexistent.txt # -> they are bound to the lifetime of the main cmd (i.e., `nonexistent.txt` is accessed -> would raise an error)
- main: sleep 5 && echo '[% my_run.host_vars.client.msg %] from client 3' | netcat -q 1 [% my_run.server_dns %] [% my_run.port %]
server:
n: 1
check_status: False
init_roles: setup-server
# run a single ncat server -> writes all incoming messages to stdout
$CMD$:
main: ncat -l [% my_run.port %] --keep-open
bg1: echo 'bg1 start' && sleep 3 && echo 'bg1 end' # (4) Auxiliary (i.e., background) cmds run on the server
bg2: echo 'bg2 start' && sleep 600 && cat nonexistent.txt
base_experiment:
server_dns: "[% exp_host_lst | to_private_dns_name('server') | default('<UNDEFINED-DNS>') %]"
port: 2807
host_vars:
client:
msg:
$FACTOR$: [hello server, hi server, good day server]
# experiment that runs the same command on two instances
exp_multi_2:
n_repetitions: 2
common_roles:
- setup-common
host_types:
small:
n: 2
$CMD$: "echo \"[% my_run.prefix %] [% my_run.n_parties %] [% my_run.postfix %]\" "
base_experiment:
n_parties:
$FACTOR$: [100, 200, 300, 500]
prefix: $FACTOR$
postfix: $FACTOR$
factor_levels:
- prefix: hi
postfix: parties
- prefix: hello
postfix: people
# results in 8 runs * 2 reps = 16 jobs
# - echo hi 100 parties
# - echo hi 200 parties
# - echo hi 300 parties
# - echo hi 500 parties
# - echo hello 100 people
# - echo hello 200 people
# - echo hello 300 people
# - echo hello 500 people
$ETL$:
check_error: # ensures that stderr.log is empty everywhere and that no files are generated except stdout.log
experiments: "*"
extractors: {ErrorExtractor: {}, IgnoreExtractor: {file_regex: ["stdout.log", "out.csv"]} }
Show Resulting Commands
$ make design suite=example05-complex
Traceback (most recent call last):
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 123, in <module>
suite_design, suite_design_ext = main(
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 42, in main
prj_id = util.get_project_id()
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/util.py", line 24, in get_project_id
raise ValueError("env variable:DOES_PROJECT_ID_SUFFIX not set")
ValueError: env variable:DOES_PROJECT_ID_SUFFIX not set
make: *** [Makefile:358: design] Error 1
example06-vars¶
The suite=example06-vars
demonstrates the re-usability of variables in the design.
---
# Goal: Show the possibilities of including configuration options from other locations
# Example Goal- Demonstrate
# (1) using $SUITE_VARS$ for defining variable defaults that are included into each `base_experiment` unless a variable is already defined.
# (contents can be $FACTOR$ or constants and can include references to other run config, see (3))
# (2) using $INCLUDE_VARS to include variable defaults defined in an external file (unless present in $SUITE_VARS$ or in the `base_experiment`)
# (contents can be $FACTOR$ or constants and can include references to other run config, see (3))
# (3) self referencing other variables of the config. With `[% my_run.X %]` we can reference another variable of the run config.
# (referenced variables can be $FACTOR$ or constants but CANNOT include references themselves, i.e., no transitive references)
$SUITE_VARS$: # (1) variables that each experiment of the suite has by default (can be overwritten)
# nested and non-nested variables work
hello:
world: o1
# suite vars are just defaults, the base_experiments has higher precedence
existing: o2
existing_factor: o3
# defining factors
define_factor: $FACTOR$
define_factor_cross:
$FACTOR$: [v1, v2]
# (3) self referencing
# with [% my_run. %] you can use other variables from the run config including factors,
# the only exception is other variables that also use [% %] tags.
base_arg: "--version [% my_run.define_factor_cross %] --option [% my_run.existing %] abc"
shared_vars: # experiment name
n_repetitions: 1
host_types:
small: # one instance of type `small`
n: 1
$CMD$: "echo \"[% my_run.base_arg %] --[% my_run.argument %] --[% my_run.define_factor %] --[% my_run.existing_factor %] \"" # command to start experiment run
base_experiment:
# all variables in $SUITE_VARS$ are part of the base experiment
# (2) you can further include variables from other files under `does_config/designs/design_vars`
$INCLUDE_VARS$: test.yml
# note: $SUITE_VARS$ have higher precedence compared to external $INCLUDE_VARS$
# Factors coming from the $SUITE_VARS$ (commented out because they are not necessary here)
# define_factor: $FACTOR$
# define_factor_cross:
# $FACTOR$: [v1, v2]
# variables defined in the $SUITE_VARS$ can be overwritten in the base experiment (with non-factor and factor)
existing: overwrite suite_vars default
existing_factor:
$FACTOR$: [1, 2]
argument: hello [% my_run.design_va_nested.arg1 %] # fix parameter between runs (constant)
factor_levels:
- define_factor: f1
- define_factor: f2
Show Resulting Commands
$ make design suite=example06-vars
Traceback (most recent call last):
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 123, in <module>
suite_design, suite_design_ext = main(
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 42, in main
prj_id = util.get_project_id()
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/util.py", line 24, in get_project_id
raise ValueError("env variable:DOES_PROJECT_ID_SUFFIX not set")
ValueError: env variable:DOES_PROJECT_ID_SUFFIX not set
make: *** [Makefile:358: design] Error 1
example07-etl¶
The suite=example07-etl
demonstrates advance usages of ETL results processing.
# The goal of this design is to showcase (advanced) features of ETL pipelines.
# Unlike in the other examples, we list the ETL pipeline before the experiment definitions.
# There are four experiment designs in this suite that define three shapes (in 4 experiments)
# each experiment outputs x,y coordinates in a file `coordinates.yaml`
# The ETL pipelines visualize these shapes in different scatter plots.
# ETL FUNCTIONALITY:
# (1) use `experiments: "*"` to avoid listing all experiments
# (2) use df.X syntax to directly use pandas df transformations: https://pandas.pydata.org/docs/reference/frame.html
# (3) including another complete pipeline
# (3.1) defined in a suite (current or another)
# (3.2) defined in a template under `designs/etl_templates`
# (4) including a "stage" of another pipeline (from suite / template)
# (4.1) include extractor stage
# (4.2) include transformer stage
# (4.3) include loader stage
# (5) use etl variables in pipeline/stage inclusion
$ETL$: # Visualization of ETL Pipeline:
# coordinate pipelines to generate plots # use `make design-validate suite=example07-etl` to see all
coord_square: # visualize the square shape #
experiments: [square] # ETL Pipeline: coord_square
extractors: #--------------------------------------------------
YamlExtractor: {} # by default loads all .yaml files #| YamlExtractor ErrorExtractor IgnoreExtractor | Extractors:
ErrorExtractor: {} # error stderr.log if non-empty #-------------------------------------------------- result files to pandas df
IgnoreExtractor: {} # ignore stdout.log # Transformers:
transformers: # | transform df
- df.sort_values: # |
by: [ exp_name, run, rep ] # V
ignore_index: True #
- df.filter: {items: ["exp_name", "x", "y"]} # df.filter
- df.eval: {expr: "color = 'black'"} # |
loaders: # V
CsvSummaryLoader: {skip_empty: True} # writing a csv # df.eval
CoordinateLoader: {} # plotting a scatter plot # |
# V
# --------------------------------------
# | CsvSummaryLoader CoordinateLoader | Loaders:
# -------------------------------------- create results, e.g., plots
coord_plus: # visualize the plus shape
experiments: [plus]
# (3) including another complete pipeline
# (3.1) defined in a suite (current or another)
# for (3.2) defined in a template under `designs/etl_templates`
# we would have to replace `suite` with `template`
$INCLUDE_PIPELINE$: {suite: example07-etl, pipeline: coord_square} # show self include
coord_triangle: # visualize the triangle shape (combining two experiments)
experiments: [triangle1, triangle2]
$INCLUDE_PIPELINE$: {suite: example07-etl, pipeline: coord_square}
# In this pipeline we want to use the same pipeline as above (same extractor + same loaders)
# but provide a custom transformer stage. Hence, we use the $INCLUDE_STEPS$ functionality.
coord_all: # visualize all shapes together
experiments: "*" # (1) use `experiments: "*"` to avoid listing all experiments
extractors:
# (4) including a "stage" of another pipeline (from suite / template)
# (4.1) include extractor stage
# instead of including a complete pipeline, it's also possible to include a stage, here the extractor stage
$INCLUDE_STEPS$: [{suite: example07-etl, pipeline: coord_square}]
transformers:
# by including all we have a more complex color assignment -> we only include extractors + loaders and provide a custom transformer stage
# (2) use df.X syntax to directly use pandas df transformations: https://pandas.pydata.org/docs/reference/frame.html
# (note there are a few limitations: some functions on df require indexes which cannot be defined here -> e.g., see Conditional Transformer)
- df.sort_values: { by: [ exp_name, run, rep ], ignore_index: True }
- df.filter: {items: ["exp_name", "x", "y"]}
- {name: ConditionalTransformer, col: "exp_name", dest: "color", value: {plus: black, square: green, triangle1: blue, triangle2: blue}}
loaders:
# (4) including a "stage" of another pipeline (from suite / template)
# (4.3) include loader stage (can replace `suite` with `template` to choose the location)
$INCLUDE_STEPS$: [{suite: example07-etl, pipeline: coord_square}]
commands: # a pipeline to write a csv with the commands
$ETL_VARS$:
skip_empty: False
experiments: "*"
$INCLUDE_PIPELINE$: {template: config_template, pipeline: config}
# The idea is that the `commands` pipeline is the same as the `commands_stage` pipeline.
# The difference is that here we show wo to import a stage individually.
# (4) including a "stage" of another pipeline (from suite / template)
commands_stage: # a pipeline to write a csv with the commands
# (5) use etl variables in pipeline/stage inclusion
# It's possible to use variables with the `[% %]` syntax.
# In combination with including pipelines or stages, this allows to specify some custom behavior.
# (The included pipeline defines a variable with [% %] and then the "including" pipeline includes
# `$ETL_VARS$` section to set the values)
$ETL_VARS$:
skip_empty: True
experiments: "*"
extractors:
# (4.1) include extractor stage
$INCLUDE_STEPS$: # include a step from another pipeline (can also be from etl_template)
- {template: config_template, pipeline: config}
# (4.2) include transformer stage
# NOTE THE SMALL SYNTAX DIFFERENCE between inlcuding transformers compared to loaders and extractors
transformers:
# could have custom steps before
- $INCLUDE_STEPS$: {template: config_template, pipeline: config} # include all steps in transformers
# could have custom steps after
loaders:
# (4.3) include loader stage
$INCLUDE_STEPS$:
- {template: config_template, pipeline: config}
# Experiment Designs
###################################################################################
square:
n_repetitions: 1
host_types:
small:
n: 1
init_roles: setup-small
$CMD$: "printf 'x: [% my_run.x %]\\ny: [% my_run.y %]' > results/coordinates.yaml"
base_experiment:
x:
$FACTOR$: [0, 1, 2]
y:
$FACTOR$: [0, 1, 2]
plus:
n_repetitions: 1
host_types:
small:
n: 1
init_roles: setup-small
$CMD$: >-
printf 'x: [% my_run.x if my_run.orient in ['N', 'S'] else my_run.x + my_run.dist if my_run.orient == 'E' else my_run.x - my_run.dist %]
\ny: [% my_run.y if my_run.orient in ['W', 'E'] else my_run.y + my_run.dist if my_run.orient == 'N' else my_run.y - my_run.dist %]' > results/coordinates.yaml
base_experiment:
x: 8
y: 5
dist:
$FACTOR$: [1, 2]
orient:
$FACTOR$: ["N", "E", "S", "W"]
triangle1:
# drawing a square
n_repetitions: 1
host_types:
small_v2:
n: 1
init_roles: setup-small
$CMD$: "printf 'x: [% my_run.x %]\\ny: [% my_run.y %]' > results/coordinates.yaml"
base_experiment:
x:
$FACTOR$: [0, 1, 2]
y:
$FACTOR$: [3, 4]
triangle2:
# drawing individual points to make the square from triangle1 to a triangle
n_repetitions: 1
host_types:
small_v2:
n: 1
init_roles: setup-small
$CMD$: "printf 'x: [% my_run.x %]\\ny: [% my_run.y %]' > results/coordinates.yaml"
base_experiment:
x: $FACTOR$
y: $FACTOR$
factor_levels:
- x: -1
y: 3
- x: 3
y: 3
- x: 1
y: 5
Show Resulting Commands
$ make design suite=example07-etl
Traceback (most recent call last):
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 123, in <module>
suite_design, suite_design_ext = main(
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 42, in main
prj_id = util.get_project_id()
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/util.py", line 24, in get_project_id
raise ValueError("env variable:DOES_PROJECT_ID_SUFFIX not set")
ValueError: env variable:DOES_PROJECT_ID_SUFFIX not set
make: *** [Makefile:358: design] Error 1
example08-superetl¶
The example08-superetl
generates dummy data that is used to showcase advanced (super) etil processing.
# The goal of this design is to generate test data to showcase features of the super etl
# (see `demo_project/doe-suite-config/super_etl/demo_plots.yml`).
#
# Demonstrate:
# (1) using a custom jinja filter_plugin `generate_fake_data` to improve the readability of the design
# see `demo_project/doe-suite-config/designs/filter_plugins/data.py`
dummydata:
n_repetitions: 1
host_types:
small:
n: 1
init_roles: setup-small
# (1) The filter_plugin `generate_fake_data` allows formatting part of the command using python code
# see `demo_project/doe-suite-config/designs/filter_plugins/data.py`
$CMD$:
main: sleep 3 && printf '[% my_run.n_measurements | generate_fake_data(my_run.system, my_run.system_config, my_run.workload) %]' > results/performance.yaml
bg1: echo 'bg1 start' && sleep 1 && echo 'bg1 end' # Background cmds run
bg2: echo 'bg2 start' && sleep 50 && cat nonexistent.txt # -> they are bound to the lifetime of the main cmd (i.e., `nonexistent.txt` is not accessed)
base_experiment:
n_measurements: 5
system:
$FACTOR$: [system1, system2, system3]
system_config:
$FACTOR$: [v1, v2]
workload:
$FACTOR$: [workload1, workload2]
$ETL$: # ensures that stderr.log is empty everywhere and that no files are generated except stdout.log
check_error:
experiments: "*"
extractors: {ErrorExtractor: {}, IgnoreExtractor: {file_regex: ["stdout.log", "performance.yaml"]} }
Show Resulting Commands
$ make design suite=example08-superetl
Traceback (most recent call last):
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 123, in <module>
suite_design, suite_design_ext = main(
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/design/validate_extend.py", line 42, in main
prj_id = util.get_project_id()
File "/home/runner/work/doe-suite/doe-suite/doespy/doespy/util.py", line 24, in get_project_id
raise ValueError("env variable:DOES_PROJECT_ID_SUFFIX not set")
ValueError: env variable:DOES_PROJECT_ID_SUFFIX not set
make: *** [Makefile:358: design] Error 1