Scale Estimation Methods
This page explains the math behind exo2micro’s scale estimation methods. For a lower-level user-facing version, see Scale Methods.
The goal
After alignment, we need a scalar s such that:
diff = post − s × pre
cleanly separates background from signal. Specifically we want
s to equal the ratio between post-stain and pre-stain
autofluorescent background, so that background pixels (where
post ≈ s × pre) subtract out near zero and microbe pixels
(where post > s × pre) stand out as positive excess.
The Moffat-fit method (default)
The canonical v2.3 method fits a Moffat profile to the left wing of the log-ratio distribution.
The setup
Compute per-pixel ratios for the “both have signal” pixels:
r = post[both] / pre[both]
and take log₁₀:
x = log10(r)
For pixels where the post-stain is pure autofluorescence (same as
pre-stain × true ratio), r clusters tightly around the true
background scale. For pixels where the post-stain also contains
microbe signal, r is larger — these pixels live in the right
tail of the log-ratio distribution.
So the distribution of x has:
A peak at
log10(true_background_scale).A left wing from noise fluctuations around that peak.
A right tail from microbe-contaminated pixels.
We want to find the peak centre, ignoring the right tail.
The Moffat profile
M(x; amp, μ, α, β) = amp × (1 + ((x − μ) / α)²)^(−β)
Moffat profiles were originally developed for astronomical point-spread functions. They have:
A sharper peak than a Gaussian (when
βis small), which matches empirical microscopy noise better.Power-law wings rather than exponential ones, which handles outliers more gracefully.
The family smoothly interpolates: β = 1 is a Lorentzian
(heavy tails), large β approaches a Gaussian. We fit β
freely and let the data decide.
Why not just fit a Gaussian? Because the pre/post ratio
distribution in real microscopy data consistently has sharper
peaks and longer tails than a Gaussian would predict. The
pre-v2.2 code tried a Voigt profile (Gaussian × Lorentzian
convolution) and it was OK, but Moffat is cleaner to fit — one
fewer parameter, and curve_fit converges more reliably.
Why not use a simple histogram peak finder? The log-ratio
histogram has quantization artifacts near log10(1) = 0 from
integer pixel values (e.g. post = 5, pre = 5 produces
exactly zero). These form a spurious spike that can fool a naive
peak finder.
The fitting procedure
The plot_ratio_histogram_simple function in
exo2micro.plotting does:
Histogram
xwith 200 bins over its full range.Smooth with a uniform filter (kernel size
2*sigma+1wheresigmais a constant 3). This gives a clean peak to find.Exclude bins within
3 × bin_widthofx = 0to avoid the quantization spike.Find the peak of the smoothed, spike-excluded histogram. Call this
μ₀— the initial guess.Select the left wing: bins with
x ≤ μ₀excluding the near-zero band.Mirror it across
μ₀: for each(x, y)point in the left wing, synthesize(2μ₀ − x, y).Fit a Moffat profile to the combined real-left-wing + synthetic-right-wing data using
scipy.optimize.curve_fit().The fitted
μis the refined peak centre.scale = 10^μ.
Why mirror the left wing? Because the right side of the real
distribution is contaminated by microbe signal (the long
positive tail we want to ignore). Mirroring the left wing gives
curve_fit a symmetric target that represents what the noise
distribution would look like without the microbe contribution,
and that’s what we want to match.
Failure modes
``curve_fit`` doesn’t converge. Rare with 200 bins, but possible on pathological data. The pipeline falls back to the smoothed-histogram peak
μ₀and prints a message. Still a usable scale, just less refined.Peak lands in the wrong place. Can happen when the data has multiple modes (e.g. two distinct tissue types with different autofluorescence ratios). In that case, check the
ratio_histogram.pngplot — if the peak is clearly wrong, fall back to a percentile-based method.Very few pixels. If
len(both) < ~1000, the histogram is too sparse for a meaningful fit. This usually indicates a failed alignment.
The ratio percentile method
Sometimes you want a simpler, more transparent alternative. The
scale_percentile parameter produces one.
Mathematically:
x = log10(post[both] / pre[both])
scale = 10^percentile(x, p)
for a user-chosen percentile p. This is a one-line reduction
with no fitting, no histogramming, no assumptions about noise
distribution shape.
When it beats the Moffat fit. When the log-ratio distribution is multi-modal or non-symmetric in a way that the Moffat assumption can’t handle, the percentile method can be more robust — you’re explicitly telling it “I want the value at this fraction of the sorted distribution”, which bypasses any notion of “peak”. The median (p = 50) in particular is a very stable estimator that just asks “what’s the typical ratio?”.
When it loses. When microbe signal pushes the upper tail around, extreme percentiles (p > 90) can land in the microbe cluster. Always prefer moderate percentiles (20-70) unless you have a strong reason.
Implementation. See SampleDye._compute_scale_percentile
in pipeline.py. It’s three lines — compute both-signal mask,
take the per-pixel ratio, take the percentile.
The manual method
manual_scale is the simplest: the user specifies an exact
scale factor and exo2micro uses it verbatim. No estimation, no
fitting.
Use cases:
Reproducing a published result that specified a particular scale factor.
Sensitivity analysis (running the same sample with several nearby scale values).
Bypassing the Moffat fit when it’s clearly misbehaving on an unusual dataset.
Coexistence
All three methods can be active simultaneously:
The Moffat fit is always computed (it drives the canonical difference image and the scale line on the excess heatmap).
If
scale_percentileis set, an additional difference image is produced at that percentile scale.If
manual_scaleis set, an additional difference image is produced at that exact scale.
The excess heatmap overplots all active scale lines in different colors, so you can visually compare them side-by-side:
Green (
#00cc88) — Moffat fitOrange (
#ff9933) — ratio percentilePink (
#ff3366) — manual value
This is the recommended workflow for a new dataset: run with all three active, compare in the excess heatmap, and decide which method you trust most for that sample type.
Internal code paths
The stage-4 diagnostic code in pipeline.py calls:
exo2micro.plot_ratio_histogram_simple()— returns the Moffat-fit scale and saves the histogram PNG.SampleDye._compute_scale_percentile()— returns the percentile scale (orNoneif no valid pixels).SampleDye._save_difference()— produces TIFF + FITS + PNG for a single scale value, labelled'moffat','percentile_p<value>', or'manual'.exo2micro.plot_excess_heatmap()with ascales=list of(label, value, colour)tuples, which draws one line per active scale.
Legacy scaling methods
exo2micro 2.1 and earlier had two additional methods,
least-squares (LS) and robust-percentile, implemented in the
removed scaling.py module. Both are preserved verbatim in
exo2micro.legacy for back-compat but are not called by the
v2.3 pipeline. The reasons they were dropped:
LS. Minimizes squared residuals over a tissue mask. This is biased upward by bright microbe pixels, which pull the dot-product estimate toward higher scale values. On well-aligned images with visible microbe signal, LS commonly oversubtracted by ~10-30%.
Robust percentile. The old
robust_percentile=90default was too aggressive — with well-aligned images, the 90th percentile of the ratio distribution consistently landed in the microbe tail and produced scale estimates ~4× too high. Moderate percentiles worked fine; the currentscale_percentileparameter is the same math with a clearer default (None, meaning “don’t use this method unless you explicitly set a percentile”).
If you need the old behaviour for reproducing an earlier result,
use exo2micro.legacy.optimize_subtraction directly:
from exo2micro.legacy import optimize_subtraction
opt_scale, scale_sig, tissue_mask, plot_data = optimize_subtraction(
post_im, pre_im, method='least_squares')