pasteur.utils.data

pasteur.utils.data#

Description

Pasteur’s data utilities. The main funcitonality provided by this module is LazyPartition and LazyDataset, with their specializations for pandas: LazyFrame, LazyChunk.

These data types allow for loading dataset partitions on command, and when the data is no longer useful, evacuating it from RAM using the del keyword.

Functions

apply_fun(obj, *args, _fun, **kwargs)

Runs function with name _fun of object obj with the provided arguments.

data_to_tables(data)

data_to_tables_ctx(data)

get_relationships(ids)

get_relative_fn(fn)

Returns the directory of a file relative to the script calling this function.

lazy_load_tables(tables)

Lazy loads partitions and keeps them in-memory in a closure.

list_unique(*args)

tables_to_data(ids, tables[, ctx])

to_chunked(func, /)

Makes wrapped function lazy evaluate.

Classes

LazyDataset(merged_load[, partitions])

LazyPartition(fun, shape_fun, /, *args, **kwargs)

RawSource(files[, save_name, credentials, desc])

Represents a raw data source that can be downloaded.

gen_closure(func, /, *args[, _fn, _eat, _return])

Creates a closure for function fun, by passing the positional arguments provided in this function to fun before the ones given to the function and by passing the sum of named arguments given to both functions.