pasteur.synth.IdentSynthJson#
- class pasteur.synth.IdentSynthJson(*args, _from_factory=False, **kwargs)[source]#
Samples the data it was provided.
Attributes
Methods
bake(data)Bakes the model based on the data provided (such as creating and modeling a bayesian network on the data).
fit(data)Fits the model based on the provided data.
get_factory(*args, **kwargs)Returns a factory that registers this module to the system.
preprocess(meta, data)Runs any preprocessing required, such as domain reduction.
sample([n])Samples n samples across partitions partitions.
sample_partition(*, n[, i])Returns synthetic data in the same format they were provided.
- bake(data)[source]#
Bakes the model based on the data provided (such as creating and modeling a bayesian network on the data).
Attributes provide context about the data columns, including hierarchical relationships, na vals, etc.
- fit(data)[source]#
Fits the model based on the provided data.
Data and Ids are dictionaries containing the dataframes with the data.
- classmethod get_factory(*args, **kwargs)#
Returns a factory that registers this module to the system.
Any *args and **kwargs passed to this function will be saved and passed to the module’s __init__() method when calling build().
-
in_sample:
bool= False#
-
in_types:
list[str] |None= ['json', 'idx']#
-
name:
str= 'ident_json'#
- partitions = 1#
- sample(n=None)[source]#
Samples n samples across partitions partitions.
The return value should be finalized to dict[str, Any], which matches the format of data provided to the fitting function. Since this
A default implementation is provided, that packages sample_partition() in such a way that pasteur can sample and save partitions in parallel.
- sample_partition(*, n, i=0)#
Returns synthetic data in the same format they were provided.
n sets how many rows should be sampled. Otherwise, Warning: not setting n technically violates DP for DP-aware algorithms.
i is the partition number that can be used for modifying the random state sampling, since deterministic sampling will always return the same data.
- Return type:
dict[str,Any]
- type = 'json'#