pasteur.extras.synth.extern.PrivMrfSynth#

class pasteur.extras.synth.extern.PrivMrfSynth(e=1.12, seed=None, rebalance_columns=None, deterministic_upsample=False, **kwargs)[source]#

Runs the PrivMrf algorithm externally.

Place the following snippet in <priv-mrf>/pasteur.py: ``` from sys import argv import PrivMRF import PrivMRF.utils.tools as tools from PrivMRF.domain import Domain import numpy as np

if __name__ == ‘__main__’:

fn_data, fn_domain, fn_synth, e = argv e = float(e)

data, _ = tools.read_csv(fn_data) data = np.array(data, dtype=int)

json_domain = tools.read_json_domain(fn_domain) domain = Domain(json_domain, list(range(data.shape[1])))

model = PrivMRF.run(data, domain, attr_hierarchy=None, exp_name=’exp’, epsilon=e, p_config={})

syn_data = model.synthetic_data(fn_synth)

```

Attributes

`gpu`
`in_sample`
`in_types`
`multimodal`
`name`
`tabular`
`timeseries`
`type`

Methods

`bake`(data, ids)	Bakes the model based on the data provided (such as creating and modeling a bayesian network on the data).
`fit`(data, ids)	Fits the model based on the provided data.
`get_factory`(args, *kwargs)	Returns a factory that registers this module to the system.
`preprocess`(attrs, data, ids)	Runs any preprocessing required, such as domain reduction.
`sample`([n])	Samples n samples across partitions partitions.
`sample_partition`(*, n[, i])	Returns synthetic data in the same format they were provided.

bake(data, ids)#

Bakes the model based on the data provided (such as creating and modeling a bayesian network on the data).

Attributes provide context about the data columns, including hierarchical relationships, na vals, etc.

fit(data, ids)#

Fits the model based on the provided data.

Data and Ids are dictionaries containing the dataframes with the data.

classmethod get_factory(*args, **kwargs)#

Returns a factory that registers this module to the system.

Any *args and **kwargs passed to this function will be saved and passed to the module’s __init__() method when calling build().

gpu = True#

in_sample: bool = False#

in_types: list[str] | None = None#

multimodal = False#

name: str = 'mrf'#

preprocess(attrs, data, ids)#: Runs any preprocessing required, such as domain reduction.

sample(n=None)#

Samples n samples across partitions partitions.

The return value should be finalized to dict[str, Any], which matches the format of data provided to the fitting function. Since this

A default implementation is provided, that packages sample_partition() in such a way that pasteur can sample and save partitions in parallel.

Return type:: tuple[dict[str, DataFrame], dict[str, DataFrame]]

sample_partition(*, n, i=0)#

Returns synthetic data in the same format they were provided.

n sets how many rows should be sampled. Otherwise, Warning: not setting n technically violates DP for DP-aware algorithms.

i is the partition number that can be used for modifying the random state sampling, since deterministic sampling will always return the same data.

Return type:: dict[str, Any]

tabular = True#

timeseries = False#

type = 'idx'#

pasteur.extras.synth.extern.PrivMrfSynth

Contents

pasteur.extras.synth.extern.PrivMrfSynth#