Troubleshooting

This page catalogues common problems and what to do about them.

“No raw files found” / “Raw image directory not found”

When you click Auto-detect (or call survey_raw_channels() or run_batch()) and exo2micro reports a layout problem, the message will be one of the following. All four come from diagnose_raw_layout() and include the canonical directory layout in the message itself, so you can copy-paste straight into your file manager.

Raw image directory not found. The raw_dir you pointed exo2micro at doesn’t exist. Most often this means the GUI was launched from a different folder than the one containing your raw/ directory. Either move your raw directory to the working directory, cd into the right place before launching JupyterLab, or pass an absolute path: raw_dir='/full/path/to/raw'.

Raw image directory is empty. The directory exists but contains nothing. Drop your sample folders into it (one folder per sample, with paired pre/post TIFFs inside).

Found N TIFF file(s) directly inside <raw_dir>, but no per-sample subfolders. The most common mistake. exo2micro requires each sample to live in its own subdirectory under raw_dir. Putting all images flat in raw/ is the natural thing to do but exo2micro can’t tell which files belong to which sample. Fix: make a folder for each sample (raw/Sample001/, raw/Sample002/, …) and move that sample’s pre/post files into its folder.

Found N sample subdirectory(ies) under <raw_dir>, but none of them contain any TIFF files. The folder structure is right but the folders themselves are empty (or contain only non-TIFF files). Check that your images are .tif or .tiff (case-insensitive) and that they’re actually inside the sample folder, not next to it.

“Some requested (sample, dye) pairs have no raw files”

When you call run_batch() with the default strict_dyes=True, requesting a (sample, dye) combination that doesn’t exist on disk raises FileNotFoundError with a single message listing every missing pair. The most common causes:

  • Typo in a sample or dye name. Check the listed dyes against your raw filenames. Dye matching is case-sensitive: DAPI and dapi are different.

  • Heterogeneous samples. Not every dye exists for every sample. Either trim your sample/dye lists, or pass strict_dyes=False to skip missing pairs and process the rest.

  • Files not copied over yet. The pair really doesn’t exist — add the missing files, or remove the affected sample/dye from your run.

In the GUI, the same situation produces a warning banner above the Run button with a “Confirm and run anyway” option that automatically sets strict_dyes=False for that run. Missing pairs render as muted gray tiles with a “(no files)” label so you can see what was filtered out at a glance.

“Estimated peak RAM exceeds available RAM”

Starting in v2.4, run_batch() and SampleDye.run() run a pre-flight resource check before any task starts. When the estimated peak RAM for your batch exceeds the available memory on the machine, the run raises MemoryError immediately with a multi-line message:

MemoryError: Estimated peak RAM (24.0 GB) exceeds available RAM (16.0 GB).
  - 12 task(s), 4 concurrent worker(s), pad=2000.
  - Per-task peak estimate uses a 6.0x factor over single-image float32 size.
Options:
  1. Reduce n_workers (currently 4).
  2. Reduce pad (currently 2000; try 1000 or 500).
  3. Close other applications to free RAM.
  4. Use parallel='subprocess' mode if leaks are suspected.
  5. Override this check with force_run=True (NOT recommended; will likely OOM-kill the kernel).

Pick the remedy that fits your situation:

  • Working alone on the machine, just want it to finish — lower n_workers until the estimate fits. Start by halving and iterate.

  • Images larger than you remembered — reduce pad from 2000 to 1000 or 500. The estimate is dominated by (H + 2·pad) × (W + 2·pad), so cutting pad in half significantly reduces peak.

  • You know the estimate is wrong for your data — pass force_run=True. Not recommended for routine use; see Memory and Performance for why.

See Memory and Performance for the full pre-flight discussion and how to tune the per-task RAM estimate if needed.

“Estimated output size exceeds free disk space”

The disk side of the pre-flight check raises OSError when the estimated total output won’t fit at output_dir:

OSError: Estimated output size (180.0 GB) exceeds free disk space (95.0 GB) at processed/.
  - 47 task(s).
  - checkpoint_format='both' (switch to 'tiff' or 'fits' alone to roughly halve).
Options:
  1. Free up disk space at processed/.
  2. Switch checkpoint_format to 'tiff' or 'fits' (not 'both').
  3. Run a smaller subset of samples/dyes.
  4. Override this check with force_run=True.

The most common cause is checkpoint_format='both' — every intermediate is written twice. Switching to 'tiff' alone (or 'fits' alone) roughly halves the output footprint.

“Subprocess killed (likely OOM)”

In subprocess mode (parallel='subprocess'), if a task’s subprocess gets killed by the operating system rather than exiting cleanly, the parent records the task as:

error: subprocess killed (likely OOM)

The most common cause is the OS OOM killer (SIGKILL, return code 137 on Linux/macOS). The subprocess hit the system’s RAM limit and was terminated to protect the rest of the system.

This shouldn’t normally happen if the pre-flight check passed, but it can happen if:

  • The estimate was too optimistic for this particular sample (an unusual image with extreme dimensions, for example).

  • Another process on the machine grew during the run and consumed the RAM the pre-flight check thought was available.

  • The 6× per-task multiplier was too low — see Memory and Performance.

The remaining tasks in the batch continue normally. To recover:

  1. Look at the summary for which tasks failed and which succeeded.

  2. Rerun just the failed tasks, with a smaller pad or a different machine.

  3. If many tasks failed the same way, the pre-flight estimate needs tuning for your data — see the discussion of the 6× factor in Memory and Performance.

“Kernel dies mid-batch” or “kernel restarts during run”

Symptoms: Jupyter or JupyterLab reports the kernel has died or been restarted partway through a batch, with no useful Python traceback. The pre-flight check passed and individual tasks were processing cleanly, but at some point the whole process disappeared.

The most common cause is one of two things:

  • Accumulating memory leak across tasks. Per-task usage looks fine, but each task leaves a bit of memory behind that gc.collect() can’t reach. After many tasks, the cumulative leak crosses the system’s RAM ceiling and the OS kills the kernel.

  • Per-task peak occasionally spikes above the estimate for unusual samples in an otherwise normal batch.

To distinguish them, enable MemoryTracker by passing memory_debug=True to run_batch():

results = e2m.run_batch(
    samples=samples, dyes=dyes,
    memory_debug=True,
)

Then watch the printed RSS values across tasks. If RSS climbs monotonically and never returns to baseline, you have a leak — switch to subprocess mode (parallel='subprocess'), which runs each task in a fresh process the OS reclaims after.

If RSS spikes per task but returns to baseline between them, peak usage is the problem — reduce pad or n_workers.

See Memory and Performance for the full diagnosis flow.

Alignment doesn’t look right

Symptoms. The boundary contours in registration.png don’t overlap cleanly, or the difference image shows ghost-like doubling of features, or the blink comparison shows features jumping between A and B.

First, figure out which stage went wrong. Use the blink comparison panel with A = post (stage 1) and B = one of the alignment stages:

  • If B = ICP-aligned pre (stage 2) looks bad, the boundary alignment failed. Move on to “Boundary alignment failing” below.

  • If stage 2 is good but B = interior-aligned pre (stage 3) is worse, the SIFT interior match went wrong. Move on to “Interior alignment failing”.

Boundary alignment failing

The coarse (stage 2) pass extracts the outer tissue boundary as a soft ring and finds the best rigid transform to overlap them. It can fail when the boundary is ill-defined, too thin, or has very different shapes between pre and post.

Things to try (in order):

  1. Increase ``boundary_width``. Default 15. Try 20 or 25. A thicker ring is more tolerant of small shape differences between the two images.

  2. Increase ``boundary_smooth``. Default 10. Try 15 or 20. More smoothing gives the phase-correlation search a gentler gradient landscape.

  3. Widen the rotation search. If your samples might be rotated more than 20° between imaging sessions, increase angle_range to 45 or 90.

  4. Widen the scale search. If magnification differs noticeably between pre and post, increase scale_max and decrease scale_min.

You can sweep any of these with the Parameter Comparison panel in the GUI to find the best value for your sample.

Interior alignment failing

Stage 3 uses SIFT feature matching on the interior of the tissue (not the boundary). Common failure modes:

  1. Too few features detected. Very uniform samples don’t give SIFT much to match. The console output will say something like interior alignment: only 50 features.

  2. Too many false matches. If the sample has many similar-looking repeating structures, RANSAC may reject everything.

Things to try:

  1. Adjust ``interior_blur_base``. Default 8. Lower it (4-6) if your features are fine-grained; raise it (10-15) if the sample is noisy at the pixel level.

  2. Lower ``interior_min_inlier_ratio``. Default 0.4. If RANSAC is rejecting otherwise reasonable matches, try 0.3 or 0.25. Too low risks accepting bad alignments, so check the result visually.

  3. Disable interior refinement entirely. Set interior_ecc=False. Stage 3 will then just copy the stage 2 result forward. This is a reasonable fallback when the sample just doesn’t have enough interior structure for SIFT.

Wrong channel detected

Symptoms. exo2micro loaded a channel that doesn’t look right — the image is black, or it’s clearly the wrong colour plane from what you expected. Or the Moffat fit produces a weird scale and the diagnostic plots look nonsensical.

Diagnose first. Click Survey raw channels in the GUI, or from Python:

from exo2micro import survey_raw_channels
survey_raw_channels('raw')

This prints, for each raw TIFF, which channels have non-zero values and what their maxima and means are. You can confirm that the dye you expect really is in the channel you expect.

If the auto-detection is picking the wrong channel, it usually means the “wrong” channel has a higher mean due to noise or background contamination. The fix is out-of-scope for this troubleshooter — see Conceptual Overview for how _extract_signal_channel works, or file an issue.

Scale estimate looks implausible

Symptoms. The Moffat fit produces a scale that doesn’t match what the data visibly demands — e.g. the difference image has a large negative (cool) region where you expected zeros.

Things to try:

  1. Check the ratio histogram. If it has multiple peaks or the peak is broad, the Moffat fit may have latched onto the wrong mode.

  2. Run with ``scale_percentile=50`` as a sanity check. The median of the log-ratio distribution is a simple, robust background estimator. If it disagrees significantly with the Moffat fit, something’s off and you should probably use the percentile value.

  3. Run with ``manual_scale`` set to a value you think is right and compare the three difference images in the excess_heatmap.png plot. Pick whichever visually tracks the background ridge best.

  4. Extreme percentiles are dangerous. Asking for the 99th percentile of the ratio distribution when even 1% of pixels are microbes will land you in the microbe tail, not the background. Prefer moderate percentiles (30-70).

Large negative patches in the difference image

Symptoms. Parts of the difference image are strongly negative — cool colours dominating where you expected near-zero.

Most common cause. The scale factor is too high — you’re oversubtracting. The auto Moffat fit can occasionally overshoot.

Fix. Either:

  1. Lower the scale manually — set manual_scale to a value slightly below the Moffat estimate and compare. Try stepping down in increments of 0.05.

  2. Use scale_percentile=40 or scale_percentile=50 — often lands closer to the true background.

Less common cause. Your alignment is off, so “pre has features where post doesn’t” creates spurious negative residuals. Check the blink comparison at a few locations. If things are jumping, fix the alignment first.

Banding or striping in the difference image

Cause. Residual rotation or shear misalignment — your alignment transform is slightly wrong in an angular sense.

Fix.

  1. Check the blink comparison on a feature near one edge of the sample and another near the opposite edge. If the feature near the edge jumps more than the feature near the centre, it’s a rotational error.

  2. Try widening angle_range (default 20 degrees) and lowering angle_step (default 1 degree) for a finer rotation search.

  3. Enable save_all_intermediates=True and use the blink panel to compare the stage-2 coarse result against the stage-2 ICP result. If ICP is making it worse, something’s wrong with boundary extraction.

Empty or nearly-empty difference image

Cause. Either the alignment completely failed (producing an all-zero aligned pre-stain) or the scale factor is way too low and the pre-stain signal is swamping the post-stain signal everywhere.

Fix. Check the pre_post_heatmap.png plot first. If it’s nearly empty or the ridge is weird, the alignment is the problem — go back to the alignment troubleshooting section above.

If the heatmap looks fine but the difference image is still empty, double-check your scale_percentile / manual_scale values — a scale of 0.1 on a dataset that really wants 1.3 will produce a difference image that’s almost entirely positive and indistinguishable from the original post-stain image.

Pipeline says a checkpoint is missing

Symptom. You set from_stage=3 and get a message like upstream checkpoints missing: [(1, 'post'), (2, 'pre')] with the pipeline then falling back to running from stage 1.

Cause. Either you haven’t run the earlier stages yet, or the parameters you’ve set change the filename of an upstream checkpoint and exo2micro can’t find the one that exists. Filenames include non-default parameter values as suffixes, so changing boundary_width from 15 to 20 means stage 2 is looking for 02_icp_aligned_pre_bw20.tiff instead of 02_icp_aligned_pre.tiff.

Fix. Either reset the parameters to match an existing run, or just let the pipeline re-run from stage 1 — it’s only slow if you have many samples.

The GUI is slow or unresponsive

Cause. Usually means Jupyter is busy rendering many inline figures. With Show diagnostic plots inline enabled and many samples in the batch, the output cell can balloon.

Fix. Uncheck Show diagnostic plots inline for large batches. The plots are still saved to pipeline_output/ — you can inspect them afterward from disk, or use the Zoom & Inspect panel to reload any particular one.

Filename problems

exo2micro is strict about raw filename conventions (see Installation). When something is wrong with a filename, you get a clear “FILE PROBLEM” message during the run, the affected (sample, dye) task fails, and the pipeline continues with the next task. All failed tasks are listed in a “PROBLEMS” section in the summary at the end of the batch.

This section catalogues each error message you might see and what to do about it.

AMBIGUOUS: <filename> contains both ‘pre’ and ‘post’ in the filename

The filename contains both substrings somewhere in it (e.g. Sample_pre_post_DAPI.tif). The loader can’t tell whether you meant pre-stain or post-stain.

Fix: rename the file so only one of pre or post appears anywhere in the name. Most often this means dropping a confusing date or run identifier — Sample_2024-03-pre_DAPI.tif is fine, Sample_pre_post_DAPI.tif is not.

NO STAIN MARKER: <filename> contains neither ‘pre’ nor ‘post’

The filename has no recognizable stain-type indicator. The loader can’t classify it as pre-stain or post-stain.

Fix: rename the file to include pre or post (or PreStain/PostStain) somewhere in the basename. Where in the name doesn’t matter — the loader does a substring search.

NO UNDERSCORE: <filename> has no underscore before the extension

The filename has no _ before .tif/.tiff, so there’s no way to extract a dye name. For example: DAPI.tif.

Fix: prepend an underscore-separated prefix: Sample001_PreStain_DAPI.tif.

EMPTY DYE: <filename> has nothing between the last underscore and the extension

There’s an underscore right before the extension (e.g. Sample_PreStain_.tif), so the dye name is empty.

Fix: add the dye name between the trailing underscore and the extension.

Incomplete pair for <sample> / <dye>: no file containing ‘pre’ ends with ``_<dye>.tif``

The loader found a post-stain file for that dye but no matching pre-stain file (or vice versa). Each (sample, dye) combination needs both halves to process.

Fix: either add the missing file, or remove the orphan side and drop that dye from your run. Note that the missing-side message mentions the exact filename pattern the loader is looking for, which is usually enough to spot the typo.

Duplicate pair for <sample> / <dye>: expected exactly one pre-stain and one post-stain file but found N

There are multiple files in the same sample directory that all match _<dye>.tif and all contain pre (or all contain post). The loader can’t decide which one to use.

Fix: rename or remove the extras. The error message lists every candidate filename so you can see exactly which files are colliding. A common cause is leaving an old run’s output in the raw directory — move it to an archive folder.

No raw files matching dye ‘<dye>’ in <directory>

You requested a dye name that doesn’t appear in any of the raw filenames in that sample directory. The error message lists every dye exo2micro did find in that directory.

Fix: either remove that dye from your run for that sample, or check for typos against the listed dyes. Note that dye matching is case-sensitive — DAPI and dapi are different.

If the requested dye name contains an underscore, the error message also explicitly flags that as a likely problem and suggests the correct dye name. The most common mistake is asking for 'SybrGld_microbe' when the dye is actually 'SybrGld' and 'microbe' was just a descriptor in the filename. Dye names must not contain underscores; the dye is whatever comes between the last underscore and the extension.

Sample directory not found: <path>

The sample subdirectory doesn’t exist under your raw_dir.

Fix: check the spelling of the sample name in your input list, and confirm the directory actually exists at the expected path. If your samples live somewhere else, pass a custom raw_dir to the GUI or to SampleDye.

A note on partial success

exo2micro processes each (sample, dye) task independently. If one task in your batch fails — say, CD070 is missing its DAPI pre-stain image — every other task in the batch still runs. You’ll see a clear error message for the failed task in real time, and at the end of the batch the summary table shows a “PROBLEMS” section listing every failed task with its full error message. There’s no need to scroll back through the mid-stream output to see what broke.

This applies even within a single sample folder: if CD070 contains clean SybrGld images and broken DAPI images, SybrGld will process successfully and DAPI will fail with a clear message about exactly what’s wrong.