Memory and Performance
This page is about choosing between serial and parallel mode, and how to avoid running out of memory when processing big batches. If you’re not sure what those words mean, the short answer is:
Leave parallel mode OFF. It’s the default. Read on if you want to know when it’s safe to turn it on, or if you’ve hit memory problems.
What “serial” and “parallel” mean here
exo2micro processes one (sample, dye) combination at a time as
a “task”. When you have many samples and many dyes, you have many
tasks. There are two ways to run them:
Serial mode (
parallel=False, the default): exo2micro processes one task, then the next, then the next. Each task fully finishes before the next one starts.Parallel mode (
parallel=True): exo2micro launches multiple worker processes that each process one task at a time. If you setn_workers=4, four tasks run at the same time.
Parallel mode can be a lot faster when you have many tasks. But it comes with a real memory cost.
Why parallel mode uses more memory
Each parallel worker is a separate Python process. Each one holds its own copy of the current sample’s images, padded canvases, alignment buffers, and so on. If a single sample uses 12 GB of RAM at peak, four parallel workers use 48 GB.
Worse: when you exceed your computer’s physical RAM, the operating system starts using disk as “virtual memory” (called swapping on Linux/Mac, paging on Windows). Disk is hundreds of times slower than RAM, so a run that’s swapping will be vastly slower than the same run in serial mode — and may also crash the Python process entirely if it runs out completely.
When to leave parallel mode OFF
Leave it off if any of these are true:
You have 16 GB of RAM or less.
You’re not sure how much RAM you have.
Your images are large (typical exo2micro images are 30,000 × 25,000 pixels; one of those is about 2.3 GB on disk and even larger in memory).
You’re going to use the computer for other things while the batch runs.
You only have a handful of tasks (3 or fewer). The overhead of starting worker processes usually makes serial just as fast.
A specific anti-pattern: don’t set parallel=True,
n_workers=1 to “try parallel mode safely”. That gives you the
worst of both worlds — the spawn overhead of parallel mode without
any of the speed benefit, and serial mode’s explicit memory
cleanup doesn’t run between tasks the way it does in actual serial
mode. If you have RAM for only one worker, just use
parallel=False.
When to turn parallel mode ON
Turn it on when all of these are true:
You have 5 or more tasks to process.
You have enough RAM to hold multiple full-resolution images at once (see the table below).
You’re not planning to use the computer for anything else during the run.
How many workers
Start conservative. Rough rule of thumb based on your total RAM:
Total RAM |
Recommended max workers |
|---|---|
8 GB or less |
1 (use |
16 GB |
1 (use |
32 GB |
2 |
64 GB |
4 |
128 GB |
8 |
Also never exceed (your CPU core count) − 1, so your computer stays responsive for system tasks.
Checking your RAM
macOS: Apple menu → About This Mac → Memory shows installed RAM. Applications → Utilities → Activity Monitor → Memory tab shows live usage.
Windows: Settings → System → About → Device specifications → Installed RAM. Ctrl+Shift+Esc → Task Manager → Performance tab → Memory shows live usage.
Watching memory during a run
The first time you run a large batch in parallel, open your system’s task/activity monitor and watch memory usage. If RAM usage climbs past 90%, or if your computer becomes sluggish:
Click the Abort button in the GUI (or interrupt the kernel in JupyterLab: Kernel → Interrupt Kernel).
Reduce
n_workersand try again.If even
n_workers=2runs out of memory, switch toparallel=False.
Already-completed tasks are preserved when you abort. The pipeline saves checkpoints after each stage of each task, so re-running will pick up where you left off rather than starting over.
Pre-flight resource checks (new in 2.4)
Starting in 2.4, both run_batch() and
SampleDye.run() run a quick resource check before any task
starts. The check reads only the raw TIFF headers (no pixel
data, fast even on networked drives), estimates the peak RAM and
total disk output your batch will produce, and compares each to
the available headroom on the machine.
Three severity levels per resource:
≤ 80% of available — silent. Run proceeds normally.
80%-100% — warning, run proceeds. A “⚠️ HIGH” line is printed; you should consider closing other applications or reducing
n_workersbut the run continues.> 100% — hard fail.
MemoryError(for RAM) orOSError(for disk) is raised before any task runs. The error message includes a remediation list with concrete suggestions (reducen_workers, reducepad, switchcheckpoint_formatto one format only, free disk, etc.) with your current values inline.
A typical successful check looks like this:
=== Pre-flight resource check ===
RAM: estimated peak 2.8 GB vs 16.0 GB available (17%) ✓
Disk: estimated total 4.1 GB vs 412 GB free (1%) ✓
=================================
This catches the case that previously caused most “kernel dies
mid-batch” reports: starting an 8-worker batch on a 32 GB machine
that needs 6 GB per task. Before 2.4, you’d see the kernel die
with no useful diagnostic. In 2.4, the same configuration raises
MemoryError immediately with a message telling you exactly
how many workers your machine can handle and why.
Overriding the check
If you know the estimate is conservative for your specific data —
for example you’ve cleared other applications since the estimate
was computed, or your samples are unusually compressible — pass
force_run=True to downgrade the hard fail to a warning:
results = e2m.run_batch(
samples=['CD070', 'CD063'],
dyes=['SybrGld'],
n_workers=8,
force_run=True,
)
This is not recommended for normal use. If a run that’s flagged
❌ EXCEEDS AVAILABLE actually does OOM-kill the Python
process mid-batch, you may end up with corrupted checkpoint
files (a half-written TIFF that the next run can’t read), so the
default behavior is to refuse the run rather than risk that.
The 6× factor
The RAM estimate is:
peak per task ≈ (H + 2·pad) × (W + 2·pad) × 4 bytes × 6
The 6× factor reflects how many full-resolution float32 image
copies coexist at the worst point of a single task (stage 2 or
stage 3, where padded post + padded pre + downsampled working
copies + warp output buffer + SIFT internals all live in memory
simultaneously). It’s a conservative estimate. If you find the
check is consistently refusing batches that actually fit on your
machine, the constant PEAK_FACTOR_PER_TASK at the top of the
memory-diagnostics section in exo2micro/utils.py can be
tuned. We expect most users won’t need to touch it.
Subprocess mode for low-RAM machines (new in 2.4)
Even in serial mode, some memory can accumulate across tasks
that gc.collect() between tasks can’t fully reclaim:
matplotlib figure state held by the pyplot module, Jupyter
Out[] cell references, cv2/tifffile internal caches. If
you’re seeing your collaborator’s kernel die partway through a
serial batch even though the pre-flight check passed, the cause
is likely one of these slow accumulating leaks.
The fix is subprocess mode: run each task in a fresh Python subprocess, exited and reclaimed by the OS between tasks.
results = e2m.run_batch(
samples=['CD070', 'CD063'],
dyes=['SybrGld', 'DAPI'],
parallel='subprocess',
)
This is a third value for the parallel argument, alongside
False (serial in-process, the default) and True
(multiprocessing pool). Each task runs in a fresh process. Tasks
run one at a time (not concurrently — for that, use
parallel=True).
When to use subprocess mode:
Your pre-flight check passes (per-task RAM fits) but the kernel still dies after a few tasks complete successfully.
The
MemoryTrackersummary (below) shows RSS climbing monotonically across tasks.You want overnight unattended batches to be robust to wedged tasks (see
timeout_per_taskbelow).
Important: subprocess mode is not the same as
parallel=True, n_workers=1. That uses
multiprocessing.Pool, which keeps a single worker
process alive across every task, so leaks accumulate in it just
as they do in serial mode. Subprocess mode spawns a new process
per task and tears it down after.
Subprocess mode adds ~1-2 seconds of process-spawn overhead per task. For typical exo2micro tasks that take minutes to align, this is invisible.
Timeouts and OOM detection
In subprocess mode you can also set timeout_per_task to
abort any task that runs too long:
results = e2m.run_batch(
samples=samples,
dyes=dyes,
parallel='subprocess',
timeout_per_task=1800, # 30 minutes per task
)
Recommended for unattended overnight batches so a wedged task doesn’t block the rest.
If a subprocess gets killed by the OS (most often SIGKILL from
the kernel’s OOM killer), the parent detects this and records
the task as 'error: subprocess killed (likely OOM)' rather
than crashing the batch. The remaining tasks continue normally.
Diagnosing memory issues (new in 2.4)
If you’ve hit a memory problem and you’re not sure whether it’s
a per-task peak overrun or an accumulating leak, the
MemoryTracker class can tell you. Pass
memory_debug=True to run_batch():
results = e2m.run_batch(
samples=['CD070', 'CD063'],
dyes=['SybrGld', 'DAPI'],
memory_debug=True,
)
This prints RSS (resident set size) snapshots before and after
each task, with an explicit gc.collect() pass in between:
[mem] 2.34 GB batch start
[mem] 2.34 GB before CD070/SybrGld
[mem] 8.91 GB after gc CD070/SybrGld
[mem] 8.91 GB before CD070/DAPI
[mem] 14.22 GB after gc CD070/DAPI
[mem] 14.22 GB before CD063/SybrGld
...
[mem] === memory summary ===
[mem] baseline: 2.34 GB
[mem] peak: 14.22 GB (+11.88 GB)
[mem] final: 14.22 GB (+11.88 GB)
[mem] WARNING: final RSS is >0.5 GB above baseline. ...
The pattern of those numbers tells you which problem you have:
RSS climbs monotonically and never returns to baseline → real leak.
gc.collect()isn’t recovering memory between tasks. Use subprocess mode (above) — that’s the only reliable cure.RSS spikes during each task but returns to baseline between them → no leak. Per-task peak just exceeds your RAM. Reduce
n_workers, reducepad, or close other applications.
The pre-flight check tries to predict the second case before any
task runs, but the tracker is what you want when you’ve gotten
past pre-flight and still have problems. Requires the optional
psutil dependency:
pip install psutil
Without psutil, memory_debug=True no-ops with a one-time
warning.
What exo2micro does on its own to manage memory
A few things happen automatically that you don’t need to think about:
In serial mode, the pipeline explicitly closes all matplotlib figures and runs Python’s garbage collector between tasks. This is more aggressive than relying on Python’s default cleanup and is the main reason serial mode is the right choice on low-RAM machines.
Within a task, intermediate image data is released as soon as each pipeline stage finishes. Stage 2’s alignment debug data (downsampled images used for the diagnostic plots) is dropped as soon as those plots are saved. Stage 3’s warp matrices are dropped at the end of stage 4. Only the small scalar scale estimates survive into the returned result.
All intermediate images are float32 on disk (4 bytes per pixel) rather than float64 (8 bytes), which halves the working memory footprint without sacrificing visible precision.
You shouldn’t normally need to do anything to make these happen. They’re built into the pipeline. They just mean that for the same hardware, exo2micro can usually process larger batches than a naive implementation could.