pasteur.utils.data.LazyDataset#

class pasteur.utils.data.LazyDataset(merged_load, partitions=None)[source]#

Attributes

Methods

are_partitioned(*positional, **keyword)

Returns whether the provided datasets are partitioned.

cache(*positional, **keyword)

items()

keys()

separate()

Splits the datasets into partitioned and not partitioned and returns them.

values()

wrap(*positional, **keyword)

Converts provided arguments to lazy.

zip(*positional, **keyword)

Aligns and returns a dictionary of partition ids to partitions.

zip_values(*positional, **keyword)

Same as zip, but doesn't return partition names and works even if the datasets are not partitioned, by returning a single partition.

static are_partitioned(*positional, **keyword)[source]#

Returns whether the provided datasets are partitioned. If they are, checks they have the same partitions.

classmethod cache(*positional, **keyword)[source]#
Return type:

Any

items()[source]#
keys()[source]#
property partitioned#
property sample#
separate()[source]#

Splits the datasets into partitioned and not partitioned and returns them.

non_partitioned, partitioned = separate_partitioned(datasets)

Return type:

tuple[dict[str, LazyDataset[TypeVar(A)]], dict[str, LazyDataset[TypeVar(A)]]]

property shape#
values()[source]#
classmethod wrap(*positional, **keyword)[source]#

Converts provided arguments to lazy. Tuples, dicts, and lists are traversed, and every object found in them is wrapped in a LazyDataset.

Return type:

Any

static zip(*positional, **keyword)[source]#

Aligns and returns a dictionary of partition ids to partitions.

Partitions can be a list, if positional arguments were provided, or a dictionary if keyword arguments were provided.

@warning: all partitioned sets should have the same keys.

static zip_values(*positional, **keyword)[source]#

Same as zip, but doesn’t return partition names and works even if the datasets are not partitioned, by returning a single partition.

Return type:

list