pasteur

pasteur#

Description

Pasteur is a library for performing end-to-end data synthesis. Gather your raw data and preprocess, synthesize, and evaluate it within a single project. Use the tools you’re familiar with (numpy, pandas, scikit-learn, scipy) and when your dataset grows, scale to out-of-core data by using Pasteur’s parallelization primitives without code changes or using different libraries.

Functions

load_ipython_extension(ipython)

Allows loading ipython functionality with load_ext pasteur

Module-System Modules

pasteur.module

Contains the module definitions in Pasteur, the base classes all Pasteur modules extend from.

pasteur.dataset

This module holds the definitions for the Dataset module, the initial entrypoint for data in Pasteur.

pasteur.view

This module holds the definitions for the View module, which appropriately preprocesses Datasets in Pasteur.

pasteur.transform

Contains the definition for Transformer and ReferenceTransformer modules.

pasteur.encode

Provides the base definition for Encoder modules

pasteur.synth

Contains the base definition for Synth(esizer modules).

pasteur.metric

This module provides the definitions for Metric Modules.

Transformation-Related Modules

pasteur.attribute

This module implements the base abstractions of Attribute and Value, which are used to encapsulate the information of complex types.

pasteur.hierarchy

Highly experimental and unpublished class for rebalancing Stratified Values with Differential Privacy.

pasteur.table

Contains the logic for handling multiple tables, and holding transformers and encoders.

Other Modules

amalgam

attribute

This module implements the base abstractions of Attribute and Value, which are used to encapsulate the information of complex types.

cli

Provides the cli entrypoint for kedro.

dataset

This module holds the definitions for the Dataset module, the initial entrypoint for data in Pasteur.

encode

Provides the base definition for Encoder modules

extras

This package contains reference implementations for Pasteur modules, which may be extracted to a separate package in the future.

graph

hierarchy

Highly experimental and unpublished class for rebalancing Stratified Values with Differential Privacy.

kedro

This module contains all kedro-related logic.

mare

marginal

This module provides a system for marginal calculation named MarginalOracle.

metadata

This module contains a base class Metadata which is used to wrap, type, and check all View parameters provided to kedro.

metric

This module provides the definitions for Metric Modules.

module

Contains the module definitions in Pasteur, the base classes all Pasteur modules extend from.

synth

Contains the base definition for Synth(esizer modules).

table

Contains the logic for handling multiple tables, and holding transformers and encoders.

transform

Contains the definition for Transformer and ReferenceTransformer modules.

utils

Base utility module for Pasteur.

view

This module holds the definitions for the View module, which appropriately preprocesses Datasets in Pasteur.

Miscellaneous Modules

pasteur.kedro

This module contains all kedro-related logic.

pasteur.utils

Base utility module for Pasteur.

pasteur.extras

This package contains reference implementations for Pasteur modules, which may be extracted to a separate package in the future.

pasteur.cli

Provides the cli entrypoint for kedro.