pasteur.extras.metrics.distr.DistributionMetric#

class pasteur.extras.metrics.distr.DistributionMetric(*args, _from_factory=False, **kwargs)[source]#

Attributes

Methods

fit(meta, data)

Fit is used to capture information about the table or column the metric will process.

get_factory(*args, **kwargs)

Returns a factory that registers this module to the system.

preprocess(wrk, ref)

Preprocess is called to cache the summaries for the wrk and ref sets during ingest.

process(wrk, ref, syn, pre)

Process is called with each set of data from the view (reference, work, synthetic).

summarize(data)

Summarize is called for dicts of runs that are not necessarily from the same view.

unique_name()

Provides a unique name for the metric which will be used for the system.

visualise(data)

Visualise is called for dicts of runs that run within the same view.

encodings: str | list[str] = 'idx'#
fit(meta, data)[source]#

Fit is used to capture information about the table or column the metric will process. It should be used to store information such as column value names, which is common among different executions of the view.

classmethod get_factory(*args, **kwargs)#

Returns a factory that registers this module to the system.

Any *args and **kwargs passed to this function will be saved and passed to the module’s __init__() method when calling build().

name: str = 'distr'#
preprocess(wrk, ref)[source]#

Preprocess is called to cache the summaries for the wrk and ref sets during ingest. Implementation is optional.

Return type:

Summaries[dict[str, tuple[dict[str, ndarray], dict[tuple[str, str], ndarray]]]]

process(wrk, ref, syn, pre)[source]#

Process is called with each set of data from the view (reference, work, synthetic). It should capture data relevant to each metric but in a synopsis or compressed form, that can be used to compute the metric for different algorithm/split combinations.

If preprocess() is implemented, pre will contain the results of the function.

Return type:

Summaries[dict[str, tuple[dict[str, ndarray], dict[tuple[str, str] | tuple[str, str | int, str], ndarray]]]]

summarize(data)#

Summarize is called for dicts of runs that are not necessarily from the same view.

It is expected to create detailed summary metrics for the run which are dataset structure independent (such as avg KL, etc).

comparison is set to False when the method is run when executing a run and to true when run to compare multiple runs. It can be used to provide different summaries

unique_name()#

Provides a unique name for the metric which will be used for the system. (currently saving artifacts).

Return type:

str

visualise(data)[source]#

Visualise is called for dicts of runs that run within the same view.

It is expected to create detailed visualizations (such as tables, figures) which utilize the structure of the view (columns etc.).

comparison is set to False when the method is run when executing a run and to true when run to compare multiple runs. It can be used to provide different summaries

If required by the visualization, wrk_set and ref_set provide the names of the synthesis source data (wrk) and reference data (ref) which can be used as a reference.