Transformers¶

The Transformer stage manipulates the raw Pandas results data frame created by the Extractor stage. There are two different syntax available:

The stage can directly invoke functions defined on the data frame, see Pandas.DataFrameFunction.
The stage can invoke custom Transformer Classes, e.g., doespy.transformers.ConditionalTransformer.

Pandas DF Transformers¶

class Pandas.DataFrameFunction¶

Can directly call all functions defined on pandas data frames: https://pandas.pydata.org/docs/reference/frame.html The syntax is different from regular transformers, use df.* and replace * with the function name. The dictionary under df.* can be used to pass named arguments of the selected function.

Parameters:: **args – Pass argument ot the function selected with df.*

Example ETL Pipeline Design¶

 $ETL$:
     transformers:
         # remove all cols except
         - df.filter: {items: ["exp_name", "x", "y"]}
         # add column to df
         - df.eval: {expr: "color = 'black'"}

Conditional Replacement¶

pydantic model doespy.etl.steps.transformers.ConditionalTransformer[source]¶

The ConditionalTransformer replaces the value in the dest column with a value from the value dict, if the value in the col column is equal to the key.

Example ETL Pipeline Design¶

 $ETL$:
     transformers:
       - name: ConditionalTransformer:
         col: Country
         dest: Code
         value:
             Switzerland: CH
             Germany: DE

Example

Country	Code
Germany
Switzerland
France

➡️

Country	Code
Germany	DE
Switzerland	CH
France

field col: str [Required]¶: Name of condition column in data frame.

field dest: str [Required]¶: Name of destination column in data frame.

field value: Dict[Any, Any] [Required]¶: Dictionary of replacement rules: The dict key is the entry in the condition col and the value is the replacement used in the dest column.

Group By Aggregates¶

pydantic model doespy.etl.steps.transformers.GroupByAggTransformer[source]¶

The GroupByAggTransformer performs a group by followed by a set of aggregate functions applied to the data_columns.

Example ETL Pipeline Design¶

 $ETL$:
     transformers:
       - name: GroupByAggTransformer:
         groupby_columns: [Run, $FACTORS$]
         data_columns: [Lat]
         agg_functions: [mean]

Example

Run	Rep	$CMD$	Lat
0	0	xyz	0.1
0	1	xyz	0.3
1	0	xyz	0.5
1	1	xyz	0.5

➡️

Run	…	Lat_mean
0		0.2
1		0.5

field data_columns: List[str] [Required]¶: The columns that contain the data to aggregate, see agg_function.

field groupby_columns: List[str] [Required]¶: The columns to perform the group by. The list can contain the magic entry $FACTORS$ that expands to all factors of the experiment. e.g., [exp_name, host_type, host_idx, $FACTORS$] would perform a group by of each run.

field agg_functions: List[str] = ['mean', 'min', 'max', 'std', 'count']¶: List of aggregate function to apply on data_columns

field custom_tail_length: int = 5¶: “custom_tail” is a custom aggregation function that calculates the mean over the last custom_tail_length entries of a column.

DoE-Suite

Navigation

Related Topics

Transformers¶

Pandas DF Transformers¶

Conditional Replacement¶

Group By Aggregates¶