pasteur.extras.synth.extern.PrivMrfSynth#
- class pasteur.extras.synth.extern.PrivMrfSynth(e=1.12, seed=None, rebalance_columns=None, deterministic_upsample=False, **kwargs)[source]#
Runs the PrivMrf algorithm externally.
Place the following snippet in <priv-mrf>/pasteur.py: ``` from sys import argv import PrivMRF import PrivMRF.utils.tools as tools from PrivMRF.domain import Domain import numpy as np
- if __name__ == ‘__main__’:
fn_data, fn_domain, fn_synth, e = argv e = float(e)
data, _ = tools.read_csv(fn_data) data = np.array(data, dtype=int)
json_domain = tools.read_json_domain(fn_domain) domain = Domain(json_domain, list(range(data.shape[1])))
model = PrivMRF.run(data, domain, attr_hierarchy=None, exp_name=’exp’, epsilon=e, p_config={})
syn_data = model.synthetic_data(fn_synth)
Attributes
Methods
bake(data, ids)Bakes the model based on the data provided (such as creating and modeling a bayesian network on the data).
fit(data, ids)Fits the model based on the provided data.
get_factory(*args, **kwargs)Returns a factory that registers this module to the system.
preprocess(attrs, data, ids)Runs any preprocessing required, such as domain reduction.
sample([n])Samples n samples across partitions partitions.
sample_partition(*, n[, i])Returns synthetic data in the same format they were provided.
- bake(data, ids)#
Bakes the model based on the data provided (such as creating and modeling a bayesian network on the data).
Attributes provide context about the data columns, including hierarchical relationships, na vals, etc.
- fit(data, ids)#
Fits the model based on the provided data.
Data and Ids are dictionaries containing the dataframes with the data.
- classmethod get_factory(*args, **kwargs)#
Returns a factory that registers this module to the system.
Any *args and **kwargs passed to this function will be saved and passed to the module’s __init__() method when calling build().
- gpu = True#
-
in_sample:
bool= False#
-
in_types:
list[str] |None= None#
- multimodal = False#
-
name:
str= 'mrf'#
- preprocess(attrs, data, ids)#
Runs any preprocessing required, such as domain reduction.
- sample(n=None)#
Samples n samples across partitions partitions.
The return value should be finalized to dict[str, Any], which matches the format of data provided to the fitting function. Since this
A default implementation is provided, that packages sample_partition() in such a way that pasteur can sample and save partitions in parallel.
- Return type:
tuple[dict[str,DataFrame],dict[str,DataFrame]]
- sample_partition(*, n, i=0)#
Returns synthetic data in the same format they were provided.
n sets how many rows should be sampled. Otherwise, Warning: not setting n technically violates DP for DP-aware algorithms.
i is the partition number that can be used for modifying the random state sampling, since deterministic sampling will always return the same data.
- Return type:
dict[str,Any]
- tabular = True#
- timeseries = False#
- type = 'idx'#