Troubleshooting =============== This page catalogues common problems and what to do about them. "No raw files found" / "Raw image directory not found" ------------------------------------------------------ When you click Auto-detect (or call :func:`~exo2micro.survey_raw_channels` or :func:`~exo2micro.run_batch`) and exo2micro reports a layout problem, the message will be one of the following. All four come from :func:`~exo2micro.diagnose_raw_layout` and include the canonical directory layout in the message itself, so you can copy-paste straight into your file manager. **Raw image directory not found.** The ``raw_dir`` you pointed exo2micro at doesn't exist. Most often this means the GUI was launched from a different folder than the one containing your ``raw/`` directory. Either move your raw directory to the working directory, ``cd`` into the right place before launching JupyterLab, or pass an absolute path: ``raw_dir='/full/path/to/raw'``. **Raw image directory is empty.** The directory exists but contains nothing. Drop your sample folders into it (one folder per sample, with paired pre/post TIFFs inside). **Found N TIFF file(s) directly inside , but no per-sample subfolders.** The most common mistake. exo2micro requires each sample to live in its own subdirectory under ``raw_dir``. Putting all images flat in ``raw/`` is the natural thing to do but exo2micro can't tell which files belong to which sample. Fix: make a folder for each sample (``raw/Sample001/``, ``raw/Sample002/``, ...) and move that sample's pre/post files into its folder. **Found N sample subdirectory(ies) under , but none of them contain any TIFF files.** The folder structure is right but the folders themselves are empty (or contain only non-TIFF files). Check that your images are ``.tif`` or ``.tiff`` (case-insensitive) and that they're actually inside the sample folder, not next to it. "Some requested (sample, dye) pairs have no raw files" ------------------------------------------------------ When you call :func:`~exo2micro.run_batch` with the default ``strict_dyes=True``, requesting a ``(sample, dye)`` combination that doesn't exist on disk raises :class:`FileNotFoundError` with a single message listing every missing pair. The most common causes: - **Typo in a sample or dye name.** Check the listed dyes against your raw filenames. Dye matching is case-sensitive: ``DAPI`` and ``dapi`` are different. - **Heterogeneous samples.** Not every dye exists for every sample. Either trim your sample/dye lists, or pass ``strict_dyes=False`` to skip missing pairs and process the rest. - **Files not copied over yet.** The pair really doesn't exist — add the missing files, or remove the affected sample/dye from your run. In the GUI, the same situation produces a warning banner above the Run button with a "Confirm and run anyway" option that automatically sets ``strict_dyes=False`` for that run. Missing pairs render as muted gray tiles with a "(no files)" label so you can see what was filtered out at a glance. "Estimated peak RAM exceeds available RAM" ------------------------------------------ Starting in v2.4, :func:`~exo2micro.run_batch` and :meth:`SampleDye.run` run a pre-flight resource check before any task starts. When the estimated peak RAM for your batch exceeds the available memory on the machine, the run raises :class:`MemoryError` immediately with a multi-line message:: MemoryError: Estimated peak RAM (24.0 GB) exceeds available RAM (16.0 GB). - 12 task(s), 4 concurrent worker(s), pad=2000. - Per-task peak estimate uses a 6.0x factor over single-image float32 size. Options: 1. Reduce n_workers (currently 4). 2. Reduce pad (currently 2000; try 1000 or 500). 3. Close other applications to free RAM. 4. Use parallel='subprocess' mode if leaks are suspected. 5. Override this check with force_run=True (NOT recommended; will likely OOM-kill the kernel). Pick the remedy that fits your situation: - **Working alone on the machine, just want it to finish** — lower ``n_workers`` until the estimate fits. Start by halving and iterate. - **Images larger than you remembered** — reduce ``pad`` from 2000 to 1000 or 500. The estimate is dominated by ``(H + 2·pad) × (W + 2·pad)``, so cutting pad in half significantly reduces peak. - **You know the estimate is wrong for your data** — pass ``force_run=True``. Not recommended for routine use; see :doc:`memory_and_performance` for why. See :doc:`memory_and_performance` for the full pre-flight discussion and how to tune the per-task RAM estimate if needed. "Estimated output size exceeds free disk space" ----------------------------------------------- The disk side of the pre-flight check raises :class:`OSError` when the estimated total output won't fit at ``output_dir``:: OSError: Estimated output size (180.0 GB) exceeds free disk space (95.0 GB) at processed/. - 47 task(s). - checkpoint_format='both' (switch to 'tiff' or 'fits' alone to roughly halve). Options: 1. Free up disk space at processed/. 2. Switch checkpoint_format to 'tiff' or 'fits' (not 'both'). 3. Run a smaller subset of samples/dyes. 4. Override this check with force_run=True. The most common cause is ``checkpoint_format='both'`` — every intermediate is written twice. Switching to ``'tiff'`` alone (or ``'fits'`` alone) roughly halves the output footprint. "Subprocess killed (likely OOM)" -------------------------------- In subprocess mode (``parallel='subprocess'``), if a task's subprocess gets killed by the operating system rather than exiting cleanly, the parent records the task as:: error: subprocess killed (likely OOM) The most common cause is the OS OOM killer (SIGKILL, return code 137 on Linux/macOS). The subprocess hit the system's RAM limit and was terminated to protect the rest of the system. This shouldn't normally happen if the pre-flight check passed, but it can happen if: - The estimate was too optimistic for this particular sample (an unusual image with extreme dimensions, for example). - Another process on the machine grew during the run and consumed the RAM the pre-flight check thought was available. - The 6× per-task multiplier was too low — see :doc:`memory_and_performance`. The remaining tasks in the batch continue normally. To recover: 1. Look at the summary for which tasks failed and which succeeded. 2. Rerun just the failed tasks, with a smaller ``pad`` or a different machine. 3. If many tasks failed the same way, the pre-flight estimate needs tuning for your data — see the discussion of the 6× factor in :doc:`memory_and_performance`. "Kernel dies mid-batch" or "kernel restarts during run" ------------------------------------------------------- Symptoms: Jupyter or JupyterLab reports the kernel has died or been restarted partway through a batch, with no useful Python traceback. The pre-flight check passed and individual tasks were processing cleanly, but at some point the whole process disappeared. The most common cause is one of two things: - **Accumulating memory leak** across tasks. Per-task usage looks fine, but each task leaves a bit of memory behind that ``gc.collect()`` can't reach. After many tasks, the cumulative leak crosses the system's RAM ceiling and the OS kills the kernel. - **Per-task peak occasionally spikes** above the estimate for unusual samples in an otherwise normal batch. To distinguish them, enable :class:`MemoryTracker` by passing ``memory_debug=True`` to :func:`run_batch`:: results = e2m.run_batch( samples=samples, dyes=dyes, memory_debug=True, ) Then watch the printed RSS values across tasks. If RSS climbs monotonically and never returns to baseline, you have a leak — switch to subprocess mode (``parallel='subprocess'``), which runs each task in a fresh process the OS reclaims after. If RSS spikes per task but returns to baseline between them, peak usage is the problem — reduce ``pad`` or ``n_workers``. See :doc:`memory_and_performance` for the full diagnosis flow. Alignment doesn't look right ---------------------------- **Symptoms.** The boundary contours in ``registration.png`` don't overlap cleanly, or the difference image shows ghost-like doubling of features, or the blink comparison shows features jumping between A and B. **First, figure out which stage went wrong.** Use the blink comparison panel with A = post (stage 1) and B = one of the alignment stages: - If **B = ICP-aligned pre (stage 2)** looks bad, the boundary alignment failed. Move on to "Boundary alignment failing" below. - If **stage 2 is good but B = interior-aligned pre (stage 3)** is worse, the SIFT interior match went wrong. Move on to "Interior alignment failing". Boundary alignment failing ~~~~~~~~~~~~~~~~~~~~~~~~~~ The coarse (stage 2) pass extracts the outer tissue boundary as a soft ring and finds the best rigid transform to overlap them. It can fail when the boundary is ill-defined, too thin, or has very different shapes between pre and post. Things to try (in order): 1. **Increase ``boundary_width``.** Default 15. Try 20 or 25. A thicker ring is more tolerant of small shape differences between the two images. 2. **Increase ``boundary_smooth``.** Default 10. Try 15 or 20. More smoothing gives the phase-correlation search a gentler gradient landscape. 3. **Widen the rotation search.** If your samples might be rotated more than 20° between imaging sessions, increase ``angle_range`` to 45 or 90. 4. **Widen the scale search.** If magnification differs noticeably between pre and post, increase ``scale_max`` and decrease ``scale_min``. You can sweep any of these with the **Parameter Comparison** panel in the GUI to find the best value for your sample. Interior alignment failing ~~~~~~~~~~~~~~~~~~~~~~~~~~ Stage 3 uses SIFT feature matching on the interior of the tissue (not the boundary). Common failure modes: 1. **Too few features detected.** Very uniform samples don't give SIFT much to match. The console output will say something like ``interior alignment: only 50 features``. 2. **Too many false matches.** If the sample has many similar-looking repeating structures, RANSAC may reject everything. Things to try: 1. **Adjust ``interior_blur_base``.** Default 8. Lower it (4-6) if your features are fine-grained; raise it (10-15) if the sample is noisy at the pixel level. 2. **Lower ``interior_min_inlier_ratio``.** Default 0.4. If RANSAC is rejecting otherwise reasonable matches, try 0.3 or 0.25. Too low risks accepting bad alignments, so check the result visually. 3. **Disable interior refinement entirely.** Set ``interior_ecc=False``. Stage 3 will then just copy the stage 2 result forward. This is a reasonable fallback when the sample just doesn't have enough interior structure for SIFT. Wrong channel detected ---------------------- **Symptoms.** exo2micro loaded a channel that doesn't look right — the image is black, or it's clearly the wrong colour plane from what you expected. Or the ``Moffat fit`` produces a weird scale and the diagnostic plots look nonsensical. **Diagnose first.** Click **Survey raw channels** in the GUI, or from Python:: from exo2micro import survey_raw_channels survey_raw_channels('raw') This prints, for each raw TIFF, which channels have non-zero values and what their maxima and means are. You can confirm that the dye you expect really is in the channel you expect. If the auto-detection is picking the wrong channel, it usually means the "wrong" channel has a higher mean due to noise or background contamination. The fix is out-of-scope for this troubleshooter — see :doc:`../developers/concepts` for how ``_extract_signal_channel`` works, or file an issue. Scale estimate looks implausible -------------------------------- **Symptoms.** The Moffat fit produces a scale that doesn't match what the data visibly demands — e.g. the difference image has a large negative (cool) region where you expected zeros. Things to try: 1. **Check the ratio histogram.** If it has multiple peaks or the peak is broad, the Moffat fit may have latched onto the wrong mode. 2. **Run with ``scale_percentile=50``** as a sanity check. The median of the log-ratio distribution is a simple, robust background estimator. If it disagrees significantly with the Moffat fit, something's off and you should probably use the percentile value. 3. **Run with ``manual_scale``** set to a value you think is right and compare the three difference images in the ``excess_heatmap.png`` plot. Pick whichever visually tracks the background ridge best. 4. **Extreme percentiles are dangerous.** Asking for the 99th percentile of the ratio distribution when even 1% of pixels are microbes will land you in the microbe tail, not the background. Prefer moderate percentiles (30-70). Large negative patches in the difference image ----------------------------------------------- **Symptoms.** Parts of the difference image are strongly negative — cool colours dominating where you expected near-zero. **Most common cause.** The scale factor is too high — you're oversubtracting. The auto Moffat fit can occasionally overshoot. **Fix.** Either: 1. Lower the scale manually — set ``manual_scale`` to a value slightly below the Moffat estimate and compare. Try stepping down in increments of 0.05. 2. Use ``scale_percentile=40`` or ``scale_percentile=50`` — often lands closer to the true background. **Less common cause.** Your alignment is off, so "pre has features where post doesn't" creates spurious negative residuals. Check the blink comparison at a few locations. If things are jumping, fix the alignment first. Banding or striping in the difference image ------------------------------------------- **Cause.** Residual rotation or shear misalignment — your alignment transform is slightly wrong in an angular sense. **Fix.** 1. Check the blink comparison on a feature near one edge of the sample and another near the opposite edge. If the feature near the edge jumps more than the feature near the centre, it's a rotational error. 2. Try widening ``angle_range`` (default 20 degrees) and lowering ``angle_step`` (default 1 degree) for a finer rotation search. 3. Enable ``save_all_intermediates=True`` and use the blink panel to compare the stage-2 coarse result against the stage-2 ICP result. If ICP is making it worse, something's wrong with boundary extraction. Empty or nearly-empty difference image -------------------------------------- **Cause.** Either the alignment completely failed (producing an all-zero aligned pre-stain) or the scale factor is way too low and the pre-stain signal is swamping the post-stain signal everywhere. **Fix.** Check the ``pre_post_heatmap.png`` plot first. If it's nearly empty or the ridge is weird, the alignment is the problem — go back to the alignment troubleshooting section above. If the heatmap looks fine but the difference image is still empty, double-check your ``scale_percentile`` / ``manual_scale`` values — a scale of 0.1 on a dataset that really wants 1.3 will produce a difference image that's almost entirely positive and indistinguishable from the original post-stain image. Pipeline says a checkpoint is missing ------------------------------------- **Symptom.** You set ``from_stage=3`` and get a message like ``upstream checkpoints missing: [(1, 'post'), (2, 'pre')]`` with the pipeline then falling back to running from stage 1. **Cause.** Either you haven't run the earlier stages yet, or the parameters you've set change the filename of an upstream checkpoint and exo2micro can't find the one that exists. Filenames include non-default parameter values as suffixes, so changing ``boundary_width`` from 15 to 20 means stage 2 is looking for ``02_icp_aligned_pre_bw20.tiff`` instead of ``02_icp_aligned_pre.tiff``. **Fix.** Either reset the parameters to match an existing run, or just let the pipeline re-run from stage 1 — it's only slow if you have many samples. The GUI is slow or unresponsive ------------------------------- **Cause.** Usually means Jupyter is busy rendering many inline figures. With **Show diagnostic plots inline** enabled and many samples in the batch, the output cell can balloon. **Fix.** Uncheck **Show diagnostic plots inline** for large batches. The plots are still saved to ``pipeline_output/`` — you can inspect them afterward from disk, or use the Zoom & Inspect panel to reload any particular one. Filename problems ----------------- exo2micro is strict about raw filename conventions (see :doc:`installation`). When something is wrong with a filename, you get a clear "FILE PROBLEM" message during the run, the affected ``(sample, dye)`` task fails, and the pipeline continues with the next task. All failed tasks are listed in a "PROBLEMS" section in the summary at the end of the batch. This section catalogues each error message you might see and what to do about it. **AMBIGUOUS: contains both 'pre' and 'post' in the filename** The filename contains both substrings somewhere in it (e.g. ``Sample_pre_post_DAPI.tif``). The loader can't tell whether you meant pre-stain or post-stain. *Fix:* rename the file so only one of ``pre`` or ``post`` appears anywhere in the name. Most often this means dropping a confusing date or run identifier — ``Sample_2024-03-pre_DAPI.tif`` is fine, ``Sample_pre_post_DAPI.tif`` is not. **NO STAIN MARKER: contains neither 'pre' nor 'post'** The filename has no recognizable stain-type indicator. The loader can't classify it as pre-stain or post-stain. *Fix:* rename the file to include ``pre`` or ``post`` (or ``PreStain``/``PostStain``) somewhere in the basename. Where in the name doesn't matter — the loader does a substring search. **NO UNDERSCORE: has no underscore before the extension** The filename has no ``_`` before ``.tif``/``.tiff``, so there's no way to extract a dye name. For example: ``DAPI.tif``. *Fix:* prepend an underscore-separated prefix: ``Sample001_PreStain_DAPI.tif``. **EMPTY DYE: has nothing between the last underscore and the extension** There's an underscore right before the extension (e.g. ``Sample_PreStain_.tif``), so the dye name is empty. *Fix:* add the dye name between the trailing underscore and the extension. **Incomplete pair for / : no file containing 'pre' ends with ``_.tif``** The loader found a post-stain file for that dye but no matching pre-stain file (or vice versa). Each ``(sample, dye)`` combination needs both halves to process. *Fix:* either add the missing file, or remove the orphan side and drop that dye from your run. Note that the missing-side message mentions the *exact* filename pattern the loader is looking for, which is usually enough to spot the typo. **Duplicate pair for / : expected exactly one pre-stain and one post-stain file but found N** There are multiple files in the same sample directory that all match ``_.tif`` and all contain ``pre`` (or all contain ``post``). The loader can't decide which one to use. *Fix:* rename or remove the extras. The error message lists every candidate filename so you can see exactly which files are colliding. A common cause is leaving an old run's output in the raw directory — move it to an archive folder. **No raw files matching dye '' in ** You requested a dye name that doesn't appear in any of the raw filenames in that sample directory. The error message lists every dye exo2micro *did* find in that directory. *Fix:* either remove that dye from your run for that sample, or check for typos against the listed dyes. Note that dye matching is case-sensitive — ``DAPI`` and ``dapi`` are different. If the requested dye name contains an underscore, the error message also explicitly flags that as a likely problem and suggests the correct dye name. The most common mistake is asking for ``'SybrGld_microbe'`` when the dye is actually ``'SybrGld'`` and ``'microbe'`` was just a descriptor in the filename. Dye names must not contain underscores; the dye is whatever comes between the last underscore and the extension. **Sample directory not found: ** The sample subdirectory doesn't exist under your ``raw_dir``. *Fix:* check the spelling of the sample name in your input list, and confirm the directory actually exists at the expected path. If your samples live somewhere else, pass a custom ``raw_dir`` to the GUI or to :class:`SampleDye`. A note on partial success ~~~~~~~~~~~~~~~~~~~~~~~~~ exo2micro processes each ``(sample, dye)`` task independently. If one task in your batch fails — say, ``CD070`` is missing its DAPI pre-stain image — every other task in the batch still runs. You'll see a clear error message for the failed task in real time, and at the end of the batch the summary table shows a "PROBLEMS" section listing every failed task with its full error message. There's no need to scroll back through the mid-stream output to see what broke. This applies even within a single sample folder: if ``CD070`` contains clean ``SybrGld`` images and broken ``DAPI`` images, ``SybrGld`` will process successfully and ``DAPI`` will fail with a clear message about exactly what's wrong.