Transformers¶
The Transformer stage manipulates the raw Pandas results data frame created by the Extractor stage. There are two different syntax available:
The stage can directly invoke functions defined on the data frame, see
Pandas.DataFrameFunction
.The stage can invoke custom Transformer Classes, e.g.,
doespy.transformers.ConditionalTransformer
.
Pandas DF Transformers¶
- class Pandas.DataFrameFunction¶
Can directly call all functions defined on pandas data frames: https://pandas.pydata.org/docs/reference/frame.html The syntax is different from regular transformers, use
df.*
and replace*
with the function name. The dictionary underdf.*
can be used to pass named arguments of the selected function.- Parameters:
**args – Pass argument ot the function selected with
df.*
$ETL$: transformers: # remove all cols except - df.filter: {items: ["exp_name", "x", "y"]} # add column to df - df.eval: {expr: "color = 'black'"}
Conditional Replacement¶
- pydantic model doespy.etl.steps.transformers.ConditionalTransformer[source]¶
The ConditionalTransformer replaces the value in the
dest
column with a value from thevalue
dict, if the value in thecol
column is equal to the key.$ETL$: transformers: - name: ConditionalTransformer: col: Country dest: Code value: Switzerland: CH Germany: DE
Example
Country
Code
Germany
Switzerland
France
➡️
Country
Code
Germany
DE
Switzerland
CH
France
- field col: str [Required]¶
Name of condition column in data frame.
- field dest: str [Required]¶
Name of destination column in data frame.
- field value: Dict[Any, Any] [Required]¶
Dictionary of replacement rules: The dict key is the entry in the condition
col
and the value is the replacement used in thedest
column.
Group By Aggregates¶
- pydantic model doespy.etl.steps.transformers.GroupByAggTransformer[source]¶
The GroupByAggTransformer performs a group by followed by a set of aggregate functions applied to the
data_columns
.$ETL$: transformers: - name: GroupByAggTransformer: groupby_columns: [Run, $FACTORS$] data_columns: [Lat] agg_functions: [mean]
Example
Run
…
Rep
$CMD$
Lat
0
0
xyz
0.1
0
1
xyz
0.3
1
0
xyz
0.5
1
1
xyz
0.5
➡️
Run
…
Lat_mean
0
0.2
1
0.5
- field data_columns: List[str] [Required]¶
The columns that contain the data to aggregate, see
agg_function
.
- field groupby_columns: List[str] [Required]¶
The columns to perform the group by. The list can contain the magic entry $FACTORS$ that expands to all factors of the experiment. e.g., [exp_name, host_type, host_idx, $FACTORS$] would perform a group by of each run.
- field agg_functions: List[str] = ['mean', 'min', 'max', 'std', 'count']¶
List of aggregate function to apply on
data_columns
- field custom_tail_length: int = 5¶
“custom_tail” is a custom aggregation function that calculates the mean over the last custom_tail_length entries of a column.