pasteur.attribute

pasteur.attribute#

Description

This module implements the base abstractions of Attribute and Value, which are used to encapsulate the information of complex types.

Values are separated into CatValues (Categorical; defined by an index) and NumValues (Numerical). In addition, we define StratifiedValue as a special CatValue that holds a Stratification.

The Stratification represents a tree, created through Grouping nodes. Groupings are essentially lists with special functions. We separate groupings based on whether they are categorical (child order irrelevant) or ordinal (nearby children are more similar to each other).

The leafs of the tree represent the values the Value can take. They take the form of strings that describe each value, but are essentially placeholders. The Value is an integer. To map leafs to integers, the Tree is searched with Depth First Search, respecting child order to be deterministic.

An Attribute holds multiple values and a set of common conditions. When a common condition is active, all of the Attribute’s Values are expected to have the same value.

Functions

CatAttribute(name, vals[, na, ukn_val, ...])

Returns an Attribute holding a single Stratified Value where its children are categorical, based on the provided data.

CommonValue(name[, na, ukn_val, ...])

GenAttribute(name, max_len)

Returns an Attribute holding a single GenerationValue with the provided data.

NumAttribute(name, bins, min, max[, nullable])

Returns an Attribute holding a single NumValue with the provided data.

OrdAttribute(name, vals[, na, ukn_val, ...])

Returns an Attribute holding a single Stratified Value where its children are ordinal, based on the provided data.

OrdValue(name, vals[, na, ukn_val, ignore_nan])

SeqAttribute(name[, table, order, max])

Returns an Attribute holding a single SeqValue with the provided data.

get_dtype(domain)

Returns the smallest NumPy unsigned integer dtype that will fit integers up to domain - 1.

Classes

Attribute(name, vals[, common, unroll, ...])

Attribute class which holds multiple values in a dictionary.

CatValue()

Class for a Categorical Value.

GenerationValue(name, max_len)

Grouping(type, arr[, title])

An enchanced form of list that holds the type of grouping (categorical, ordinal), and implements helper functions and an enchanced string representation.

IdxValue

alias of CatValue

NumValue(name, bins[, nullable, min, max, ...])

Numerical Value: its value can be represented with a number, which might be NaN.

SeqAttributes(order, seq, attrs, hist)

SeqValue(name, table[, order, max])

StratifiedNumValue(name, name_cnt, head[, ...])

StratifiedValue(name, head[, common, ignore_nan])

A version of CategoricalValue which uses a Stratification to represent the domain knowledge of the Value.

Value()

Base value class