Extending exo2micro
===================

This page covers how to modify exo2micro for uses the shipping
pipeline doesn't cover — custom plots, new workflows, integration
with other tools.

Adding a new diagnostic plot
----------------------------

Every stage-4 diagnostic plot lives as a function in
``exo2micro.plotting`` that takes ``(post_im, pre_im, ..., save_path=None)``
and is called from ``SampleDye._run_stage_4_diagnostics``.

To add a new plot:

1. **Write the plotting function.** Put it in
   ``exo2micro/plotting.py``. Follow the existing signature
   pattern — take ``post_im``, ``pre_im``, any method-specific
   kwargs, and ``sample``, ``dye``, ``save_path``. Return the
   matplotlib figure. Close the figure if ``save_path`` is given;
   call ``plt.show()`` otherwise.

2. **Export it** from ``exo2micro/__init__.py``::

      from .plotting import plot_my_new_diagnostic

3. **Call it from stage 4.** In
   ``_run_stage_4_diagnostics``::

      my_path = self._check_path('my_new_diagnostic')
      if force or not os.path.exists(my_path):
          plotting.plot_my_new_diagnostic(
              post_full, pre_aligned,
              sample=self.sample, dye=self.dye,
              save_path=my_path)

4. **Add it to the inline preview** in ``gui.py`` under
   ``_show_inline_results`` so the GUI displays it after a run::

      plots_to_show.append(
          ('my_new_diagnostic', 'Short caption explaining it.'))

5. **Add it to ``SampleDye.status()``** in ``pipeline.py`` so
   status checks include it.

Adding a new parameter
----------------------

Parameters live in ``exo2micro/defaults.py`` in
``PARAMETER_REGISTRY``. Each entry is::

   ('param_name', (default_value, 'abbrev', stage_number, 'description')),

Steps:

1. **Add to the registry** with a short abbreviation that
   doesn't collide with any existing one. Check
   ``ABBREVIATIONS`` first. The abbreviation appears in
   checkpoint filenames as ``{abbrev}{value}``.

2. **Consume it** in the relevant stage method. Access via
   ``self._params['param_name']``. Parameters are type-checked
   by ``set_params`` against their default value's type.

3. **Choose the type carefully.** Booleans, ints, floats, and
   ``None`` round-trip through filename suffixes via
   ``params_from_suffix``. Strings work but rarely needed.

4. **Update the parameter reference docs** in
   ``docs/source/developers/parameters.rst``.

Boolean parameters serialize as ``0`` / ``1`` in filenames (e.g.
``_iecc0``). None-able parameters serialize as ``none`` (e.g.
``_sp_none``). Floats use ``:g`` format, so ``1.42`` →
``msc1.42``, ``1.0`` → ``msc1``.

Adding a new pipeline stage
---------------------------

This is a deeper change and touches several files.

1. **Update ``STAGE_NAMES`` and ``MAX_STAGE``** in ``defaults.py``.
   Stages are numbered 1-indexed; filenames embed the number
   zero-padded to two digits.

2. **Add a ``_run_stage_N_...`` method** on ``SampleDye``
   following the pattern of the existing ones:

   - Take ``force=False`` as the only argument.
   - Check for existing checkpoints via ``self._has_checkpoint``
     and skip if present (unless ``force``).
   - Auto-run missing upstream stages.
   - Load inputs via ``self._load_image``.
   - Save outputs via ``self._save_image``.

3. **Wire it into ``run()``.** Add the stage guard block to
   ``SampleDye.run``::

      if from_stage <= N and to_stage >= N:
          self._run_stage_N_mystage(force)

4. **Add its files to ``_check_upstream``** so downstream stages
   can detect when it's missing.

5. **Add its files to ``status()``** so users can see its
   checkpoint status.

6. **Update ``gui.py``** — the ``from_stage`` / ``to_stage``
   dropdowns are built from ``STAGE_NAMES`` automatically, so
   they'll pick up the new stage without code changes.

Custom image loaders
--------------------

If your raw data isn't in the TIFF-with-``PreStain``/``PostStain``
naming convention exo2micro expects, you have two options:

**Option 1: Write files in the expected format before running.**
Simplest — just rename or copy your files into the structure
exo2micro wants.

**Option 2: Replace ``load_image_pair``.** It lives in
``exo2micro.utils`` and returns a 4-tuple of
``(post_im, pre_im, post_path, pre_path)``. Write your own
function with the same signature, then patch it::

   import exo2micro.utils as u
   import exo2micro.pipeline as p

   def my_loader(sample, dye, raw_dir='...'):
       # your logic here
       return post, pre, post_path, pre_path

   u.load_image_pair = my_loader
   p.load_image_pair = my_loader

Ugly but effective. A cleaner alternative is to subclass
``SampleDye`` and override ``_run_stage_1_padding``.

Accessing pipeline internals
----------------------------

The stage methods are prefixed with underscores
(``_run_stage_1_padding``, etc.) to signal that they're internal
implementation details. However, they're stable across minor
releases within a major version, and sometimes you need them.

Common things you might want:

- :meth:`SampleDye._load_image(stage, name)` — load any
  checkpoint as a numpy array. Returns ``None`` if not found.
- :meth:`SampleDye._save_image(image, stage, name, extra_headers=None)`
  — save an array as TIFF + FITS with the usual metadata.
- :meth:`SampleDye._has_checkpoint(stage, name)` — check if a
  specific checkpoint file exists for the current parameters.
- :attr:`SampleDye._tiff_path(stage, name)` /
  :attr:`SampleDye._fits_path(stage, name)` /
  :attr:`SampleDye._check_path(name)` — construct the full
  on-disk paths for checkpoints and check plots.
- :attr:`SampleDye._results` — transient dict of in-memory
  results from the current run. Keys include ``warp_matrix``,
  ``debug_data``, ``scale_estimate``,
  ``scale_percentile_value``. Not persisted across instances.

Integration with downstream analysis
------------------------------------

A typical downstream analysis reads the difference image and
segments the positive signal::

   from astropy.io import fits
   from skimage import morphology, measure

   hdul = fits.open(
       'processed/CD070/SybrGld_microbe/fits/04_difference_difference.fits')
   diff = hdul[0].data
   scale = hdul[0].header['SCALE']

   # Threshold at e.g. 5 sigma above background noise
   bg_std = diff[diff != 0].std()
   mask = diff > 5 * bg_std

   # Clean up small artefacts
   mask = morphology.remove_small_objects(mask, min_size=20)

   # Label connected components
   labels = measure.label(mask)
   props = measure.regionprops(labels, intensity_image=diff)

   for p in props:
       print(f"feature {p.label}: centroid {p.centroid}  "
             f"area {p.area}  mean intensity {p.mean_intensity:.1f}")

The FITS header carries the scale factor and all non-default
parameters, so this downstream script has full provenance
information for every feature it detects. For fully reproducible
pipelines, log ``hdul[0].header`` alongside your segmentation
results.

Where to look in the code
-------------------------

The package is organised around modules with clear
responsibilities:

- ``pipeline.py`` — the :class:`SampleDye` class, parameter
  management, stage orchestration, file I/O routing.
- ``alignment.py`` — image registration. ``register_highorder``
  is the main entry point and does boundary correlation + ICP
  in one call. ``refine_interior_ecc`` is the SIFT refinement.
- ``plotting.py`` — active visualization functions
  (diagnostics, zoom, registration plots).
- ``utils.py`` — file I/O, preprocessing, masking, channel
  detection. Most things in here are used by ``pipeline.py`` but
  also exported for standalone use.
- ``defaults.py`` — parameter registry and filename conventions.
- ``parallel.py`` — batch processing (serial and parallel).
- ``gui.py`` — ipywidgets GUI, layered on top of ``SampleDye``.
- ``legacy.py`` — deprecated functions kept for back-compat.

For autodoc-generated details on every public function, see
:doc:`api/index`.