pasteur.amalgam.synth.AmalgamSynth#
- class pasteur.amalgam.synth.AmalgamSynth(pgm_cls, pgm={'etotal': 2.0}, marginal={'min_chunk': 100, 'mode': 'out_of_core', 'worker_mult': 1}, prompt='', model={'filename': 'Qwen3-8B-Q4_K_M.gguf', 'n_ctx': 40960, 'n_gpu_layers': -1, 'repo_id': 'Qwen/Qwen3-8B-GGUF', 'type': 'hf', 'workers': 1}, rebalance={'fixed': [4, 9, 18, 32], 'u': 7.0, 'unbounded_dp': True}, samples=None, **kwargs)[source]#
Attributes
Methods
bake(data)Bakes the model based on the data provided (such as creating and modeling a bayesian network on the data).
fit(data)Fits the model based on the provided data.
get_factory(*args, **kwargs)Returns a factory that registers this module to the system.
preprocess(meta, data)Runs any preprocessing required, such as domain reduction.
sample([n, data, _llm])Samples n samples across partitions partitions.
sample_partition(*, n[, i])Returns synthetic data in the same format they were provided.
- bake(data)[source]#
Bakes the model based on the data provided (such as creating and modeling a bayesian network on the data).
Attributes provide context about the data columns, including hierarchical relationships, na vals, etc.
- fit(data)[source]#
Fits the model based on the provided data.
Data and Ids are dictionaries containing the dataframes with the data.
- classmethod get_factory(*args, **kwargs)#
Returns a factory that registers this module to the system.
Any *args and **kwargs passed to this function will be saved and passed to the module’s __init__() method when calling build().
-
in_sample:
bool= True#
-
in_types:
list[str] |None= ['json', 'flat']#
-
name:
str= 'amalgam'#
- partitions = 1#
- sample(n=None, data=None, _llm=None)[source]#
Samples n samples across partitions partitions.
The return value should be finalized to dict[str, Any], which matches the format of data provided to the fitting function. Since this
A default implementation is provided, that packages sample_partition() in such a way that pasteur can sample and save partitions in parallel.
- Return type:
- sample_partition(*, n, i=0)#
Returns synthetic data in the same format they were provided.
n sets how many rows should be sampled. Otherwise, Warning: not setting n technically violates DP for DP-aware algorithms.
i is the partition number that can be used for modifying the random state sampling, since deterministic sampling will always return the same data.
- Return type:
dict[str,Any]
- type = 'json'#