pasteur.encode.AttributeEncoder#

class pasteur.encode.AttributeEncoder(*args, _from_factory=False, **kwargs)[source]#

Encapsulates a special way to encode an Attribute.

One encoder is instantiated per attribute and its fit function is called to adjust it to the base layer data.

For partitioned datasets, the fit method is called once per partition with a different instance of AttributeEncoder, and then reduce is called by the different instances to perform a reduciton.

The data value may contain a superset of columns than that of the encoder. It is up to the encoder to filter it prior to processing. data should not be mutated.

If the input data of the synthesis algorithm are in Dataframe form and referencing other tables is not required, it is natural to use an AttributeEncoder to handle encoding per-attribute.

@Warning: after fitting, the module may be serialized, unserialized, and its encode and decode methods may be called arbitrarily from different processes to encode and decode sets of columns.

Attributes

Methods

decode(enc)

encode(data)

fit(attr, data)

get_factory(*args, **kwargs)

Returns a factory that registers this module to the system.

get_metadata()

reduce(other)

decode(enc)[source]#
Return type:

DataFrame

encode(data)[source]#
Return type:

DataFrame

fit(attr, data)[source]#
classmethod get_factory(*args, **kwargs)#

Returns a factory that registers this module to the system.

Any *args and **kwargs passed to this function will be saved and passed to the module’s __init__() method when calling build().

get_metadata()[source]#
Return type:

dict[str | tuple[str, ...], TypeVar(META)]

name: str = ''#
reduce(other)[source]#