pasteur.extras.synth.extern.ExternalPythonSynth#

class pasteur.extras.synth.extern.ExternalPythonSynth(cmd, venv=None, dir=None, rebalance_columns=True, **_)[source]#

Abstract class for calling synthesis algorithms written in python from other projects.

Can use a custom virtual environment, custom project directory, and custom command. With parameters defined per execution.

Attributes

Methods

bake(data, ids)

Bakes the model based on the data provided (such as creating and modeling a bayesian network on the data).

fit(data, ids)

Fits the model based on the provided data.

get_factory(*args, **kwargs)

Returns a factory that registers this module to the system.

preprocess(attrs, data, ids)

Runs any preprocessing required, such as domain reduction.

sample([n])

Samples n samples across partitions partitions.

sample_partition(*, n[, i])

Returns synthetic data in the same format they were provided.

bake(data, ids)[source]#

Bakes the model based on the data provided (such as creating and modeling a bayesian network on the data).

Attributes provide context about the data columns, including hierarchical relationships, na vals, etc.

fit(data, ids)[source]#

Fits the model based on the provided data.

Data and Ids are dictionaries containing the dataframes with the data.

classmethod get_factory(*args, **kwargs)#

Returns a factory that registers this module to the system.

Any *args and **kwargs passed to this function will be saved and passed to the module’s __init__() method when calling build().

gpu = True#
in_sample: bool = False#
in_types: list[str] | None = None#
multimodal = False#
name: str#
preprocess(attrs, data, ids)[source]#

Runs any preprocessing required, such as domain reduction.

sample(n=None)[source]#

Samples n samples across partitions partitions.

The return value should be finalized to dict[str, Any], which matches the format of data provided to the fitting function. Since this

A default implementation is provided, that packages sample_partition() in such a way that pasteur can sample and save partitions in parallel.

Return type:

tuple[dict[str, DataFrame], dict[str, DataFrame]]

sample_partition(*, n, i=0)#

Returns synthetic data in the same format they were provided.

n sets how many rows should be sampled. Otherwise, Warning: not setting n technically violates DP for DP-aware algorithms.

i is the partition number that can be used for modifying the random state sampling, since deterministic sampling will always return the same data.

Return type:

dict[str, Any]

tabular = True#
timeseries = False#
type = 'idx'#