pasteur.extras.synth.privbayes.PrivBayesSynth#
- class pasteur.extras.synth.privbayes.PrivBayesSynth(ep=None, e1=0.3, e2=0.7, etotal=None, theta=4, use_r=True, seed=None, rebalance=False, unbounded_dp=False, random_init=False, marginal_mode='out_of_core', marginal_worker_mult=1, marginal_min_chunk=100, skip_zero_counts=True, minimum_cutoff=3, **kwargs)[source]#
Attributes
Methods
bake(data)Bakes the model based on the data provided (such as creating and modeling a bayesian network on the data).
fit(data)Fits the model based on the provided data.
get_factory(*args, **kwargs)Returns a factory that registers this module to the system.
preprocess(meta, data)Runs any preprocessing required, such as domain reduction.
sample(*[, n, partitions, data])Samples n samples across partitions partitions.
sample_partition(*, n[, i])Returns synthetic data in the same format they were provided.
- bake(data)[source]#
Bakes the model based on the data provided (such as creating and modeling a bayesian network on the data).
Attributes provide context about the data columns, including hierarchical relationships, na vals, etc.
- fit(data)[source]#
Fits the model based on the provided data.
Data and Ids are dictionaries containing the dataframes with the data.
- classmethod get_factory(*args, **kwargs)#
Returns a factory that registers this module to the system.
Any *args and **kwargs passed to this function will be saved and passed to the module’s __init__() method when calling build().
-
in_sample:
bool= False#
-
in_types:
list[str] |None= None#
- multimodal = False#
-
name:
str= 'privbayes'#
- parallel = True#
- sample(*, n=None, partitions=None, data=None)#
Samples n samples across partitions partitions.
The return value should be finalized to dict[str, Any], which matches the format of data provided to the fitting function. Since this
A default implementation is provided, that packages sample_partition() in such a way that pasteur can sample and save partitions in parallel.
- sample_partition(*, n, i=0)[source]#
Returns synthetic data in the same format they were provided.
n sets how many rows should be sampled. Otherwise, Warning: not setting n technically violates DP for DP-aware algorithms.
i is the partition number that can be used for modifying the random state sampling, since deterministic sampling will always return the same data.
- Return type:
dict[str,Any]
- tabular = True#
- timeseries = False#
- type = 'idx'#