pasteur.kedro.runner.parallel.SimpleParallelRunner

pasteur.kedro.runner.parallel.SimpleParallelRunner#

class pasteur.kedro.runner.parallel.SimpleParallelRunner(pipe_name=None, params_str=None, max_workers=None, refresh_processes=None, resume_node=None)[source]#

Methods

run(pipeline, catalog[, hook_manager, ...])

Run the Pipeline using the datasets provided by catalog and save results back to the same objects.

run_only_missing(pipeline, catalog, hook_manager)

Run only the missing outputs from the Pipeline using the datasets provided by catalog, and save results back to the same objects.

run(pipeline, catalog, hook_manager=None, session_id=None)#

Run the Pipeline using the datasets provided by catalog and save results back to the same objects.

Parameters:
  • pipeline (Pipeline) – The Pipeline to run.

  • catalog (CatalogProtocol) – An implemented instance of CatalogProtocol from which to fetch data.

  • hook_manager (Optional[PluginManager]) – The PluginManager to activate hooks.

  • session_id (Optional[str]) – The id of the session.

Raises:

ValueError – Raised when Pipeline inputs cannot be satisfied.

Return type:

dict[str, Any]

Returns:

Any node outputs that cannot be processed by the catalog. These are returned in a dictionary, where the keys are defined by the node outputs.

run_only_missing(pipeline, catalog, hook_manager)#

Run only the missing outputs from the Pipeline using the datasets provided by catalog, and save results back to the same objects.

Parameters:
  • pipeline (Pipeline) – The Pipeline to run.

  • catalog (CatalogProtocol) – An implemented instance of CatalogProtocol from which to fetch data.

  • hook_manager (PluginManager) – The PluginManager to activate hooks.

Raises:

ValueError – Raised when Pipeline inputs cannot be satisfied.

Return type:

dict[str, Any]

Returns:

Any node outputs that cannot be processed by the catalog. These are returned in a dictionary, where the keys are defined by the node outputs.