Troubleshooting
This page catalogues common problems and what to do about them.
“No raw files found” / “Raw image directory not found”
When you click Auto-detect (or call survey_raw_channels()
or run_batch()) and exo2micro reports a layout problem,
the message will be one of the following. All four come from
diagnose_raw_layout() and include the canonical
directory layout in the message itself, so you can copy-paste straight
into your file manager.
Raw image directory not found.
The raw_dir you pointed exo2micro at doesn’t exist. Most often this
means the GUI was launched from a different folder than the one
containing your raw/ directory. Either move your raw directory to
the working directory, cd into the right place before launching
JupyterLab, or pass an absolute path: raw_dir='/full/path/to/raw'.
Raw image directory is empty. The directory exists but contains nothing. Drop your sample folders into it (one folder per sample, with paired pre/post TIFFs inside).
Found N TIFF file(s) directly inside <raw_dir>, but no per-sample
subfolders.
The most common mistake. exo2micro requires each sample to live in
its own subdirectory under raw_dir. Putting all images flat in
raw/ is the natural thing to do but exo2micro can’t tell which
files belong to which sample. Fix: make a folder for each sample
(raw/Sample001/, raw/Sample002/, …) and move that sample’s
pre/post files into its folder.
Found N sample subdirectory(ies) under <raw_dir>, but none of them
contain any TIFF files.
The folder structure is right but the folders themselves are empty
(or contain only non-TIFF files). Check that your images are
.tif or .tiff (case-insensitive) and that they’re actually
inside the sample folder, not next to it.
“Some requested (sample, dye) pairs have no raw files”
When you call run_batch() with the default
strict_dyes=True, requesting a (sample, dye) combination
that doesn’t exist on disk raises FileNotFoundError with
a single message listing every missing pair. The most common
causes:
Typo in a sample or dye name. Check the listed dyes against your raw filenames. Dye matching is case-sensitive:
DAPIanddapiare different.Heterogeneous samples. Not every dye exists for every sample. Either trim your sample/dye lists, or pass
strict_dyes=Falseto skip missing pairs and process the rest.Files not copied over yet. The pair really doesn’t exist — add the missing files, or remove the affected sample/dye from your run.
In the GUI, the same situation produces a warning banner above the
Run button with a “Confirm and run anyway” option that automatically
sets strict_dyes=False for that run. Missing pairs render as
muted gray tiles with a “(no files)” label so you can see what was
filtered out at a glance.
“Estimated peak RAM exceeds available RAM”
Starting in v2.4, run_batch() and
SampleDye.run() run a pre-flight resource check before any
task starts. When the estimated peak RAM for your batch exceeds
the available memory on the machine, the run raises
MemoryError immediately with a multi-line message:
MemoryError: Estimated peak RAM (24.0 GB) exceeds available RAM (16.0 GB).
- 12 task(s), 4 concurrent worker(s), pad=2000.
- Per-task peak estimate uses a 6.0x factor over single-image float32 size.
Options:
1. Reduce n_workers (currently 4).
2. Reduce pad (currently 2000; try 1000 or 500).
3. Close other applications to free RAM.
4. Use parallel='subprocess' mode if leaks are suspected.
5. Override this check with force_run=True (NOT recommended; will likely OOM-kill the kernel).
Pick the remedy that fits your situation:
Working alone on the machine, just want it to finish — lower
n_workersuntil the estimate fits. Start by halving and iterate.Images larger than you remembered — reduce
padfrom 2000 to 1000 or 500. The estimate is dominated by(H + 2·pad) × (W + 2·pad), so cutting pad in half significantly reduces peak.You know the estimate is wrong for your data — pass
force_run=True. Not recommended for routine use; see Memory and Performance for why.
See Memory and Performance for the full pre-flight discussion and how to tune the per-task RAM estimate if needed.
“Estimated output size exceeds free disk space”
The disk side of the pre-flight check raises OSError
when the estimated total output won’t fit at output_dir:
OSError: Estimated output size (180.0 GB) exceeds free disk space (95.0 GB) at processed/.
- 47 task(s).
- checkpoint_format='both' (switch to 'tiff' or 'fits' alone to roughly halve).
Options:
1. Free up disk space at processed/.
2. Switch checkpoint_format to 'tiff' or 'fits' (not 'both').
3. Run a smaller subset of samples/dyes.
4. Override this check with force_run=True.
The most common cause is checkpoint_format='both' — every
intermediate is written twice. Switching to 'tiff' alone (or
'fits' alone) roughly halves the output footprint.
“Subprocess killed (likely OOM)”
In subprocess mode (parallel='subprocess'), if a task’s
subprocess gets killed by the operating system rather than
exiting cleanly, the parent records the task as:
error: subprocess killed (likely OOM)
The most common cause is the OS OOM killer (SIGKILL, return code 137 on Linux/macOS). The subprocess hit the system’s RAM limit and was terminated to protect the rest of the system.
This shouldn’t normally happen if the pre-flight check passed, but it can happen if:
The estimate was too optimistic for this particular sample (an unusual image with extreme dimensions, for example).
Another process on the machine grew during the run and consumed the RAM the pre-flight check thought was available.
The 6× per-task multiplier was too low — see Memory and Performance.
The remaining tasks in the batch continue normally. To recover:
Look at the summary for which tasks failed and which succeeded.
Rerun just the failed tasks, with a smaller
pador a different machine.If many tasks failed the same way, the pre-flight estimate needs tuning for your data — see the discussion of the 6× factor in Memory and Performance.
“Kernel dies mid-batch” or “kernel restarts during run”
Symptoms: Jupyter or JupyterLab reports the kernel has died or been restarted partway through a batch, with no useful Python traceback. The pre-flight check passed and individual tasks were processing cleanly, but at some point the whole process disappeared.
The most common cause is one of two things:
Accumulating memory leak across tasks. Per-task usage looks fine, but each task leaves a bit of memory behind that
gc.collect()can’t reach. After many tasks, the cumulative leak crosses the system’s RAM ceiling and the OS kills the kernel.Per-task peak occasionally spikes above the estimate for unusual samples in an otherwise normal batch.
To distinguish them, enable MemoryTracker by passing
memory_debug=True to run_batch():
results = e2m.run_batch(
samples=samples, dyes=dyes,
memory_debug=True,
)
Then watch the printed RSS values across tasks. If RSS climbs
monotonically and never returns to baseline, you have a leak —
switch to subprocess mode (parallel='subprocess'), which
runs each task in a fresh process the OS reclaims after.
If RSS spikes per task but returns to baseline between them,
peak usage is the problem — reduce pad or n_workers.
See Memory and Performance for the full diagnosis flow.
Alignment doesn’t look right
Symptoms. The boundary contours in registration.png don’t
overlap cleanly, or the difference image shows ghost-like
doubling of features, or the blink comparison shows features
jumping between A and B.
First, figure out which stage went wrong. Use the blink comparison panel with A = post (stage 1) and B = one of the alignment stages:
If B = ICP-aligned pre (stage 2) looks bad, the boundary alignment failed. Move on to “Boundary alignment failing” below.
If stage 2 is good but B = interior-aligned pre (stage 3) is worse, the SIFT interior match went wrong. Move on to “Interior alignment failing”.
Boundary alignment failing
The coarse (stage 2) pass extracts the outer tissue boundary as a soft ring and finds the best rigid transform to overlap them. It can fail when the boundary is ill-defined, too thin, or has very different shapes between pre and post.
Things to try (in order):
Increase ``boundary_width``. Default 15. Try 20 or 25. A thicker ring is more tolerant of small shape differences between the two images.
Increase ``boundary_smooth``. Default 10. Try 15 or 20. More smoothing gives the phase-correlation search a gentler gradient landscape.
Widen the rotation search. If your samples might be rotated more than 20° between imaging sessions, increase
angle_rangeto 45 or 90.Widen the scale search. If magnification differs noticeably between pre and post, increase
scale_maxand decreasescale_min.
You can sweep any of these with the Parameter Comparison panel in the GUI to find the best value for your sample.
Interior alignment failing
Stage 3 uses SIFT feature matching on the interior of the tissue (not the boundary). Common failure modes:
Too few features detected. Very uniform samples don’t give SIFT much to match. The console output will say something like
interior alignment: only 50 features.Too many false matches. If the sample has many similar-looking repeating structures, RANSAC may reject everything.
Things to try:
Adjust ``interior_blur_base``. Default 8. Lower it (4-6) if your features are fine-grained; raise it (10-15) if the sample is noisy at the pixel level.
Lower ``interior_min_inlier_ratio``. Default 0.4. If RANSAC is rejecting otherwise reasonable matches, try 0.3 or 0.25. Too low risks accepting bad alignments, so check the result visually.
Disable interior refinement entirely. Set
interior_ecc=False. Stage 3 will then just copy the stage 2 result forward. This is a reasonable fallback when the sample just doesn’t have enough interior structure for SIFT.
Wrong channel detected
Symptoms. exo2micro loaded a channel that doesn’t look right
— the image is black, or it’s clearly the wrong colour plane from
what you expected. Or the Moffat fit produces a weird scale
and the diagnostic plots look nonsensical.
Diagnose first. Click Survey raw channels in the GUI, or from Python:
from exo2micro import survey_raw_channels
survey_raw_channels('raw')
This prints, for each raw TIFF, which channels have non-zero values and what their maxima and means are. You can confirm that the dye you expect really is in the channel you expect.
If the auto-detection is picking the wrong channel, it usually
means the “wrong” channel has a higher mean due to noise or
background contamination. The fix is out-of-scope for this
troubleshooter — see Conceptual Overview for how
_extract_signal_channel works, or file an issue.
Scale estimate looks implausible
Symptoms. The Moffat fit produces a scale that doesn’t match what the data visibly demands — e.g. the difference image has a large negative (cool) region where you expected zeros.
Things to try:
Check the ratio histogram. If it has multiple peaks or the peak is broad, the Moffat fit may have latched onto the wrong mode.
Run with ``scale_percentile=50`` as a sanity check. The median of the log-ratio distribution is a simple, robust background estimator. If it disagrees significantly with the Moffat fit, something’s off and you should probably use the percentile value.
Run with ``manual_scale`` set to a value you think is right and compare the three difference images in the
excess_heatmap.pngplot. Pick whichever visually tracks the background ridge best.Extreme percentiles are dangerous. Asking for the 99th percentile of the ratio distribution when even 1% of pixels are microbes will land you in the microbe tail, not the background. Prefer moderate percentiles (30-70).
Large negative patches in the difference image
Symptoms. Parts of the difference image are strongly negative — cool colours dominating where you expected near-zero.
Most common cause. The scale factor is too high — you’re oversubtracting. The auto Moffat fit can occasionally overshoot.
Fix. Either:
Lower the scale manually — set
manual_scaleto a value slightly below the Moffat estimate and compare. Try stepping down in increments of 0.05.Use
scale_percentile=40orscale_percentile=50— often lands closer to the true background.
Less common cause. Your alignment is off, so “pre has features where post doesn’t” creates spurious negative residuals. Check the blink comparison at a few locations. If things are jumping, fix the alignment first.
Banding or striping in the difference image
Cause. Residual rotation or shear misalignment — your alignment transform is slightly wrong in an angular sense.
Fix.
Check the blink comparison on a feature near one edge of the sample and another near the opposite edge. If the feature near the edge jumps more than the feature near the centre, it’s a rotational error.
Try widening
angle_range(default 20 degrees) and loweringangle_step(default 1 degree) for a finer rotation search.Enable
save_all_intermediates=Trueand use the blink panel to compare the stage-2 coarse result against the stage-2 ICP result. If ICP is making it worse, something’s wrong with boundary extraction.
Empty or nearly-empty difference image
Cause. Either the alignment completely failed (producing an all-zero aligned pre-stain) or the scale factor is way too low and the pre-stain signal is swamping the post-stain signal everywhere.
Fix. Check the pre_post_heatmap.png plot first. If it’s
nearly empty or the ridge is weird, the alignment is the problem
— go back to the alignment troubleshooting section above.
If the heatmap looks fine but the difference image is still
empty, double-check your scale_percentile / manual_scale
values — a scale of 0.1 on a dataset that really wants 1.3 will
produce a difference image that’s almost entirely positive and
indistinguishable from the original post-stain image.
Pipeline says a checkpoint is missing
Symptom. You set from_stage=3 and get a message like
upstream checkpoints missing: [(1, 'post'), (2, 'pre')] with
the pipeline then falling back to running from stage 1.
Cause. Either you haven’t run the earlier stages yet, or the
parameters you’ve set change the filename of an upstream
checkpoint and exo2micro can’t find the one that exists. Filenames
include non-default parameter values as suffixes, so changing
boundary_width from 15 to 20 means stage 2 is looking for
02_icp_aligned_pre_bw20.tiff instead of 02_icp_aligned_pre.tiff.
Fix. Either reset the parameters to match an existing run, or just let the pipeline re-run from stage 1 — it’s only slow if you have many samples.
The GUI is slow or unresponsive
Cause. Usually means Jupyter is busy rendering many inline figures. With Show diagnostic plots inline enabled and many samples in the batch, the output cell can balloon.
Fix. Uncheck Show diagnostic plots inline for large
batches. The plots are still saved to pipeline_output/ — you
can inspect them afterward from disk, or use the Zoom & Inspect
panel to reload any particular one.
Filename problems
exo2micro is strict about raw filename conventions (see
Installation). When something is wrong with a filename, you
get a clear “FILE PROBLEM” message during the run, the affected
(sample, dye) task fails, and the pipeline continues with the
next task. All failed tasks are listed in a “PROBLEMS” section in
the summary at the end of the batch.
This section catalogues each error message you might see and what to do about it.
AMBIGUOUS: <filename> contains both ‘pre’ and ‘post’ in the filename
The filename contains both substrings somewhere in it (e.g.
Sample_pre_post_DAPI.tif). The loader can’t tell whether you
meant pre-stain or post-stain.
Fix: rename the file so only one of pre or post appears
anywhere in the name. Most often this means dropping a confusing
date or run identifier — Sample_2024-03-pre_DAPI.tif is fine,
Sample_pre_post_DAPI.tif is not.
NO STAIN MARKER: <filename> contains neither ‘pre’ nor ‘post’
The filename has no recognizable stain-type indicator. The loader can’t classify it as pre-stain or post-stain.
Fix: rename the file to include pre or post (or
PreStain/PostStain) somewhere in the basename. Where in the
name doesn’t matter — the loader does a substring search.
NO UNDERSCORE: <filename> has no underscore before the extension
The filename has no _ before .tif/.tiff, so there’s no
way to extract a dye name. For example: DAPI.tif.
Fix: prepend an underscore-separated prefix:
Sample001_PreStain_DAPI.tif.
EMPTY DYE: <filename> has nothing between the last underscore and the extension
There’s an underscore right before the extension (e.g.
Sample_PreStain_.tif), so the dye name is empty.
Fix: add the dye name between the trailing underscore and the extension.
Incomplete pair for <sample> / <dye>: no file containing ‘pre’ ends with ``_<dye>.tif``
The loader found a post-stain file for that dye but no matching
pre-stain file (or vice versa). Each (sample, dye) combination
needs both halves to process.
Fix: either add the missing file, or remove the orphan side and drop that dye from your run. Note that the missing-side message mentions the exact filename pattern the loader is looking for, which is usually enough to spot the typo.
Duplicate pair for <sample> / <dye>: expected exactly one pre-stain and one post-stain file but found N
There are multiple files in the same sample directory that all
match _<dye>.tif and all contain pre (or all contain
post). The loader can’t decide which one to use.
Fix: rename or remove the extras. The error message lists every candidate filename so you can see exactly which files are colliding. A common cause is leaving an old run’s output in the raw directory — move it to an archive folder.
No raw files matching dye ‘<dye>’ in <directory>
You requested a dye name that doesn’t appear in any of the raw filenames in that sample directory. The error message lists every dye exo2micro did find in that directory.
Fix: either remove that dye from your run for that sample, or
check for typos against the listed dyes. Note that dye matching is
case-sensitive — DAPI and dapi are different.
If the requested dye name contains an underscore, the error message
also explicitly flags that as a likely problem and suggests the
correct dye name. The most common mistake is asking for
'SybrGld_microbe' when the dye is actually 'SybrGld' and
'microbe' was just a descriptor in the filename. Dye names must
not contain underscores; the dye is whatever comes between the
last underscore and the extension.
Sample directory not found: <path>
The sample subdirectory doesn’t exist under your raw_dir.
Fix: check the spelling of the sample name in your input list,
and confirm the directory actually exists at the expected path. If
your samples live somewhere else, pass a custom raw_dir to the
GUI or to SampleDye.
A note on partial success
exo2micro processes each (sample, dye) task independently. If
one task in your batch fails — say, CD070 is missing its DAPI
pre-stain image — every other task in the batch still runs. You’ll
see a clear error message for the failed task in real time, and at
the end of the batch the summary table shows a “PROBLEMS” section
listing every failed task with its full error message. There’s no
need to scroll back through the mid-stream output to see what
broke.
This applies even within a single sample folder: if CD070
contains clean SybrGld images and broken DAPI images,
SybrGld will process successfully and DAPI will fail with a
clear message about exactly what’s wrong.