pasteur.transform.RefTransformer#

class pasteur.transform.RefTransformer(**_)[source]#

Reference Transformers use a reference column as an input to create their embeddings.

They can be used to integrate constraints (and domain knowledge) into embeddings, in such a way that all embeddings produce valid solutions and learning is easier.

For example, consider an end date embedding that references a start date. The embedding will form a stable histogram with much less entropy, based on the period length. In addition, provided that the embedding is forced to be positive, any value it takes will produce a valid solution.

Attributes

deterministic

For a given output, the input is the same.

lossless

The decoded output equals the input.

stateful

Transformer fits variables.

name

Methods

fit(data[, ref])

Fits to the provided data

fit_transform(data[, ref])

get_attributes()

get_factory(*args, **kwargs)

Returns a factory that registers this module to the system.

reduce(other)

reverse(data[, ref])

When reversing, the data column contains encoded data, whereas the ref column contains decoded/original data.

transform(data[, ref])

deterministic = True#

For a given output, the input is the same.

fit(data, ref=None)[source]#

Fits to the provided data

fit_transform(data, ref=None)[source]#
Return type:

DataFrame

get_attributes()#
Return type:

Mapping[str | tuple[str, ...], Attribute]

classmethod get_factory(*args, **kwargs)#

Returns a factory that registers this module to the system.

Any *args and **kwargs passed to this function will be saved and passed to the module’s __init__() method when calling build().

lossless = True#

The decoded output equals the input.

name: str#
reduce(other)[source]#
reverse(data, ref=None)[source]#

When reversing, the data column contains encoded data, whereas the ref column contains decoded/original data. Therefore, the referred columns have to be decoded first.

Return type:

DataFrame

stateful = False#

Transformer fits variables.

transform(data, ref=None)[source]#
Return type:

DataFrame