pasteur.view.View#
- class pasteur.view.View(**_)[source]#
A class for a View named <name> based on dataset <dataset> that creates a set of tables based on the provided dependencies, where here they are tables in the dataset provided.
The set of tables is deps.keys(). It will be based on tables set(deps.values()).
If used with kedro, the pipeline will look for the following dataset tables: <dataset>@<table>.
Then, it will produce tables in the following format: <name>.<table>.
For decoding a particular view, it may be required to decode the tables in a particular order. trn_deps defines that order. It needs to be static, so it can’t be placed in parameters.yml
parameters_fn, if provided, will be used to load a parameters file with defaults for the view (such as metadata). Useful for packaging. Use utils.get_relative_fn() from datasets.
Attributes
The name of the View's Dataset.
Returns the dataset tables required by the View.
Defines the Tables of the View and their Dataset dependencies, ex.:
If true, transformers and encoders for this view will be fit on the global dataset.
Returns the table names of the view.
Methods
filter_table(name, keys, **tables)Filters the table using the keys provided.
query(name, **tables)Equivalent to ingest in Dataset.
split_keys(keys, req_splits, splits, ...)Takes the key frame and splits it into the portions specified by splits.
-
dataset:
str= ''# The name of the View’s Dataset. If the Dataset is not loaded, the View is disabled.
- property dataset_tables#
Returns the dataset tables required by the View.
-
deps:
dict[str,list[str]] = {}# Defines the Tables of the View and their Dataset dependencies, ex.:
`python {"table1": ["master_table1", "master_table2"], "table2": ["master_table3"]} `
-
fit_global:
bool= False#
-
name:
str= ''#
-
parameters:
dict[str,Any] |str|None= None# If true, transformers and encoders for this view will be fit on the global dataset. Resolves encoding errors that stem from sampling the partial view. When true, subsampling the view is not possible during transformation and encoding, which may add significant overhead.
- split_keys(keys, req_splits, splits, random_state)[source]#
Takes the key frame and splits it into the portions specified by splits. Then, return the split with names in req_splits.
Should produce the same results each run regardless of the value of split, because it will be ran once per split.
- Return type:
dict[str,Union[DataFrame,LazyDataset[DataFrame]]]
- property tables#
Returns the table names of the view.
-
trn_deps:
dict[str,list[str]] = {}#
-
dataset: