Memory and Performance

This page is about choosing between serial and parallel mode, and how to avoid running out of memory when processing big batches. If you’re not sure what those words mean, the short answer is:

Leave parallel mode OFF. It’s the default. Read on if you want to know when it’s safe to turn it on, or if you’ve hit memory problems.

What “serial” and “parallel” mean here

exo2micro processes one (sample, dye) combination at a time as a “task”. When you have many samples and many dyes, you have many tasks. There are two ways to run them:

Serial mode (parallel=False, the default): exo2micro processes one task, then the next, then the next. Each task fully finishes before the next one starts.
Parallel mode (parallel=True): exo2micro launches multiple worker processes that each process one task at a time. If you set n_workers=4, four tasks run at the same time.

Parallel mode can be a lot faster when you have many tasks. But it comes with a real memory cost.

Why parallel mode uses more memory

Each parallel worker is a separate Python process. Each one holds its own copy of the current sample’s images, padded canvases, alignment buffers, and so on. If a single sample uses 12 GB of RAM at peak, four parallel workers use 48 GB.

Worse: when you exceed your computer’s physical RAM, the operating system starts using disk as “virtual memory” (called swapping on Linux/Mac, paging on Windows). Disk is hundreds of times slower than RAM, so a run that’s swapping will be vastly slower than the same run in serial mode — and may also crash the Python process entirely if it runs out completely.

When to leave parallel mode OFF

Leave it off if any of these are true:

You have 16 GB of RAM or less.
You’re not sure how much RAM you have.
Your images are large (typical exo2micro images are 30,000 × 25,000 pixels; one of those is about 2.3 GB on disk and even larger in memory).
You’re going to use the computer for other things while the batch runs.
You only have a handful of tasks (3 or fewer). The overhead of starting worker processes usually makes serial just as fast.

A specific anti-pattern: don’t set parallel=True, n_workers=1 to “try parallel mode safely”. That gives you the worst of both worlds — the spawn overhead of parallel mode without any of the speed benefit, and serial mode’s explicit memory cleanup doesn’t run between tasks the way it does in actual serial mode. If you have RAM for only one worker, just use parallel=False.

When to turn parallel mode ON

Turn it on when all of these are true:

You have 5 or more tasks to process.
You have enough RAM to hold multiple full-resolution images at once (see the table below).
You’re not planning to use the computer for anything else during the run.

How many workers

Start conservative. Rough rule of thumb based on your total RAM:

Total RAM	Recommended max workers
8 GB or less	1 (use `parallel=False`)
16 GB	1 (use `parallel=False`)
32 GB	2
64 GB	4
128 GB	8

Also never exceed (your CPU core count) − 1, so your computer stays responsive for system tasks.

Checking your RAM

macOS: Apple menu → About This Mac → Memory shows installed RAM. Applications → Utilities → Activity Monitor → Memory tab shows live usage.
Windows: Settings → System → About → Device specifications → Installed RAM. Ctrl+Shift+Esc → Task Manager → Performance tab → Memory shows live usage.

Watching memory during a run

The first time you run a large batch in parallel, open your system’s task/activity monitor and watch memory usage. If RAM usage climbs past 90%, or if your computer becomes sluggish:

Click the Abort button in the GUI (or interrupt the kernel in JupyterLab: Kernel → Interrupt Kernel).
Reduce n_workers and try again.
If even n_workers=2 runs out of memory, switch to parallel=False.

Already-completed tasks are preserved when you abort. The pipeline saves checkpoints after each stage of each task, so re-running will pick up where you left off rather than starting over.

Pre-flight resource checks (new in 2.4)

Starting in 2.4, both run_batch() and SampleDye.run() run a quick resource check before any task starts. The check reads only the raw TIFF headers (no pixel data, fast even on networked drives), estimates the peak RAM and total disk output your batch will produce, and compares each to the available headroom on the machine.

Three severity levels per resource:

≤ 80% of available — silent. Run proceeds normally.
80%-100% — warning, run proceeds. A “⚠️ HIGH” line is printed; you should consider closing other applications or reducing n_workers but the run continues.
> 100% — hard fail. MemoryError (for RAM) or OSError (for disk) is raised before any task runs. The error message includes a remediation list with concrete suggestions (reduce n_workers, reduce pad, switch checkpoint_format to one format only, free disk, etc.) with your current values inline.

A typical successful check looks like this:

=== Pre-flight resource check ===
  RAM: estimated peak 2.8 GB vs 16.0 GB available (17%)  ✓
  Disk: estimated total 4.1 GB vs 412 GB free (1%)  ✓
=================================

This catches the case that previously caused most “kernel dies mid-batch” reports: starting an 8-worker batch on a 32 GB machine that needs 6 GB per task. Before 2.4, you’d see the kernel die with no useful diagnostic. In 2.4, the same configuration raises MemoryError immediately with a message telling you exactly how many workers your machine can handle and why.

Overriding the check

If you know the estimate is conservative for your specific data — for example you’ve cleared other applications since the estimate was computed, or your samples are unusually compressible — pass force_run=True to downgrade the hard fail to a warning:

results = e2m.run_batch(
    samples=['CD070', 'CD063'],
    dyes=['SybrGld'],
    n_workers=8,
    force_run=True,
)

This is not recommended for normal use. If a run that’s flagged ❌ EXCEEDS AVAILABLE actually does OOM-kill the Python process mid-batch, you may end up with corrupted checkpoint files (a half-written TIFF that the next run can’t read), so the default behavior is to refuse the run rather than risk that.

The 6× factor

The RAM estimate is:

peak per task ≈ (H + 2·pad) × (W + 2·pad) × 4 bytes × 6

The 6× factor reflects how many full-resolution float32 image copies coexist at the worst point of a single task (stage 2 or stage 3, where padded post + padded pre + downsampled working copies + warp output buffer + SIFT internals all live in memory simultaneously). It’s a conservative estimate. If you find the check is consistently refusing batches that actually fit on your machine, the constant PEAK_FACTOR_PER_TASK at the top of the memory-diagnostics section in exo2micro/utils.py can be tuned. We expect most users won’t need to touch it.

Subprocess mode for low-RAM machines (new in 2.4)

Even in serial mode, some memory can accumulate across tasks that gc.collect() between tasks can’t fully reclaim: matplotlib figure state held by the pyplot module, Jupyter Out[] cell references, cv2/tifffile internal caches. If you’re seeing your collaborator’s kernel die partway through a serial batch even though the pre-flight check passed, the cause is likely one of these slow accumulating leaks.

The fix is subprocess mode: run each task in a fresh Python subprocess, exited and reclaimed by the OS between tasks.

results = e2m.run_batch(
    samples=['CD070', 'CD063'],
    dyes=['SybrGld', 'DAPI'],
    parallel='subprocess',
)

This is a third value for the parallel argument, alongside False (serial in-process, the default) and True (multiprocessing pool). Each task runs in a fresh process. Tasks run one at a time (not concurrently — for that, use parallel=True).

When to use subprocess mode:

Your pre-flight check passes (per-task RAM fits) but the kernel still dies after a few tasks complete successfully.
The MemoryTracker summary (below) shows RSS climbing monotonically across tasks.
You want overnight unattended batches to be robust to wedged tasks (see timeout_per_task below).

Important: subprocess mode is not the same as parallel=True, n_workers=1. That uses multiprocessing.Pool, which keeps a single worker process alive across every task, so leaks accumulate in it just as they do in serial mode. Subprocess mode spawns a new process per task and tears it down after.

Subprocess mode adds ~1-2 seconds of process-spawn overhead per task. For typical exo2micro tasks that take minutes to align, this is invisible.

Timeouts and OOM detection

In subprocess mode you can also set timeout_per_task to abort any task that runs too long:

results = e2m.run_batch(
    samples=samples,
    dyes=dyes,
    parallel='subprocess',
    timeout_per_task=1800,   # 30 minutes per task
)

Recommended for unattended overnight batches so a wedged task doesn’t block the rest.

If a subprocess gets killed by the OS (most often SIGKILL from the kernel’s OOM killer), the parent detects this and records the task as 'error: subprocess killed (likely OOM)' rather than crashing the batch. The remaining tasks continue normally.

Diagnosing memory issues (new in 2.4)

If you’ve hit a memory problem and you’re not sure whether it’s a per-task peak overrun or an accumulating leak, the MemoryTracker class can tell you. Pass memory_debug=True to run_batch():

results = e2m.run_batch(
    samples=['CD070', 'CD063'],
    dyes=['SybrGld', 'DAPI'],
    memory_debug=True,
)

This prints RSS (resident set size) snapshots before and after each task, with an explicit gc.collect() pass in between:

[mem]   2.34 GB  batch start
[mem]   2.34 GB  before CD070/SybrGld
[mem]   8.91 GB  after gc CD070/SybrGld
[mem]   8.91 GB  before CD070/DAPI
[mem]  14.22 GB  after gc CD070/DAPI
[mem]  14.22 GB  before CD063/SybrGld
...
[mem] === memory summary ===
[mem] baseline:   2.34 GB
[mem] peak:      14.22 GB  (+11.88 GB)
[mem] final:     14.22 GB  (+11.88 GB)
[mem] WARNING: final RSS is >0.5 GB above baseline. ...

The pattern of those numbers tells you which problem you have:

RSS climbs monotonically and never returns to baseline → real leak. gc.collect() isn’t recovering memory between tasks. Use subprocess mode (above) — that’s the only reliable cure.
RSS spikes during each task but returns to baseline between them → no leak. Per-task peak just exceeds your RAM. Reduce n_workers, reduce pad, or close other applications.

The pre-flight check tries to predict the second case before any task runs, but the tracker is what you want when you’ve gotten past pre-flight and still have problems. Requires the optional psutil dependency:

pip install psutil

Without psutil, memory_debug=True no-ops with a one-time warning.

What exo2micro does on its own to manage memory

A few things happen automatically that you don’t need to think about:

In serial mode, the pipeline explicitly closes all matplotlib figures and runs Python’s garbage collector between tasks. This is more aggressive than relying on Python’s default cleanup and is the main reason serial mode is the right choice on low-RAM machines.
Within a task, intermediate image data is released as soon as each pipeline stage finishes. Stage 2’s alignment debug data (downsampled images used for the diagnostic plots) is dropped as soon as those plots are saved. Stage 3’s warp matrices are dropped at the end of stage 4. Only the small scalar scale estimates survive into the returned result.
All intermediate images are float32 on disk (4 bytes per pixel) rather than float64 (8 bytes), which halves the working memory footprint without sacrificing visible precision.

You shouldn’t normally need to do anything to make these happen. They’re built into the pipeline. They just mean that for the same hardware, exo2micro can usually process larger batches than a naive implementation could.