pasteur.utils.progress.init_pool

pasteur.utils.progress.init_pool#

class pasteur.utils.progress.init_pool(max_workers=None, refresh_processes=None)[source]#

Methods

__init__(max_workers=None, refresh_processes=None)[source]#

Creates a shared process pool for all threads in this process.

max_workers should be set based either on cores or on how many RAM GBs will be required by each process.

log_queue connects the subprocesses to the main process logger, see pasteur.kedro.runner.parallel.py

refresh_processes sets maxtasksperchild for the pool, which prevents memory leaks from snowballing from node to node. However, due to additional imports every restart, it is slower.