Utils Module
utils.py
Shared utility functions used across the exo2micro sub-modules.
- Includes:
Image I/O helpers (TIFF / FITS reading and writing with metadata)
Preprocessing (median subtraction, normalisation, padding, trimming)
Masking helpers (tissue masks, joint masks, fill holes)
Gaussian smoothing with NaN conservation
Intensity equalisation for image pairs
Display helpers (robust vmax, RGB overlays)
- class exo2micro.utils.MemoryTracker(enabled=False)[source]
Bases:
objectTrack resident-set-size (RSS) across pipeline tasks.
Use this to confirm whether memory is actually being released between tasks in a batch. If RSS climbs monotonically across tasks, there is a leak somewhere (matplotlib figures, Jupyter
Out[]history, retained widget state, unfreed numpy temporaries). If RSS returns to baseline after the explicitgc.collect()in each task footer, then per-task peak is just exceeding available RAM and the answer is reducingpad, using a smaller working resolution, or switching to subprocess-per-task mode (seeexo2micro.parallel.run_batch_subprocess()).When
enabled=False(the default) all methods are cheap no-ops, so leaving the calls in production code costs essentially nothing.Requires
psutilto actually do anything. If psutil is missing the tracker prints a one-time warning and no-ops.Example
>>> from exo2micro.utils import MemoryTracker >>> tracker = MemoryTracker(enabled=True) >>> tracker.snapshot('start') >>> for sample, dye in tasks: ... tracker.snapshot(f'before {sample}/{dye}') ... SampleDye(sample, dye).run() ... tracker.collect_and_snapshot(f'after gc {sample}/{dye}') >>> tracker.summary()
- class exo2micro.utils.MemoryWatchdog(min_available_gb=0.5, poll_interval_sec=5.0, verbose=False)[source]
Bases:
objectBackground thread that polls available RAM and signals when it falls below a threshold.
Usage
>>> from exo2micro.utils import MemoryWatchdog >>> wd = MemoryWatchdog(min_available_gb=0.5) >>> wd.start() >>> try: ... for sample, dye in tasks: ... wd.check_or_raise(f"before {sample}/{dye}") ... SampleDye(sample, dye).run() ... finally: ... wd.stop()
The watchdog itself doesn’t interrupt anything — it just sets a flag. The caller must poll via
check_or_raise()at safe abort points (typically stage boundaries) to actually halt.- param min_available_gb:
Threshold in gibibytes. When
psutil.virtual_memory().availabledrops below this, the watchdog trips. Default 0.5 GB.- type min_available_gb:
float
- param poll_interval_sec:
How often to poll RAM. Default 5.0 seconds. Polling too fast wastes CPU; too slow misses fast-growing leaks.
- type poll_interval_sec:
float
- param verbose:
If True, prints a warning line each time RAM drops below the threshold. Default False (silent until tripped).
- type verbose:
bool
- check_or_raise(label='')[source]
Raise
MemoryErrorif the watchdog is tripped.Call at safe abort points (e.g. between pipeline stages).
labelis included in the error message to identify where the abort happened.
- class exo2micro.utils.TeeStdout(log_path)[source]
Bases:
objectFile-like object that writes to both stdout and a log file.
Used as a context manager during pipeline runs to capture every line of pipeline output (including text from inside library functions that use
print()directly) into the persistent run log, without disturbing the normal stdout flow that the GUI’swidgets.Outputcontext manager captures.Usage:
with TeeStdout(log_path): with widget_output: run.run() # all prints go to widget AND log file
Failures writing to the file are silently swallowed; the underlying stdout writes always succeed.
- exo2micro.utils.append_to_run_log(output_dir, message)[source]
Append a line to the persistent run log.
Failures (e.g. permission errors, missing directory) are silently ignored — the log is a best-effort persistence aid, not a critical path.
- exo2micro.utils.build_clean_tissue_mask(post, pre)[source]
Build a clean tissue mask using binary_fill_holes only (no dilation).
This is the mask used for residual histograms and as the base for the signal-only fitting mask.
- Parameters:
post (ndarray) – Post-stain image (float).
pre (ndarray) – Pre-stain aligned image (float).
- Returns:
tissue_mask
- Return type:
ndarray of bool
- exo2micro.utils.build_tissue_mask(post_im, pre_im, signal_threshold=0, dilation_iters=50)[source]
Build a joint tissue mask from post-stain and pre-stain images.
Each image is independently thresholded, dilated, and hole-filled, then the two masks are intersected.
- Parameters:
- Returns:
joint_mask – Intersection of post and pre tissue masks.
- Return type:
ndarray of bool
- exo2micro.utils.classify_raw_files(sample_dir)[source]
Classify TIFF files in a sample directory by stain type and dye.
Filename rules
A valid raw image filename must:
End with
.tifor.tiff(case-insensitive).Contain
preorpost(case-insensitive) somewhere in the basename, marking it as a pre-stain or post-stain image.End with
_<DyeName>.tif(or.tiff), where<DyeName>is the dye identifier and contains no underscores. The dye name is the substring between the last underscore and the extension.
Examples of valid filenames:
Sample001_PreStain_SybrGld.tif -> pre, dye=SybrGld Sample001_PostStain_SybrGld.tiff -> post, dye=SybrGld my_2024_pre_run3_DAPI.tif -> pre, dye=DAPI whatever_post_Cy5.tiff -> post, dye=Cy5
Examples of invalid filenames (will be flagged in
warnings):Sample001_PreStain_SybrGld_microbe.tif– dye name contains an underscore. Would be parsed as dyemicrobe. RenameSybrGld_microbetoSybrGldmicrobeor similar.Sample001_pre_post_SybrGld.tif– contains bothpreandpost. Ambiguous, skipped.Sample001_SybrGld.tif– contains neitherprenorpost. Cannot be classified, skipped.
This function is non-raising: it returns whatever it could parse plus a list of human-readable warnings about anything it couldn’t. Callers that need to fail hard on missing or duplicate pairs (e.g.
load_image_pair()) should check the returned structures themselves.- param sample_dir:
Path to a single sample’s directory.
- type sample_dir:
str
- returns:
pairs (dict) – Maps each detected dye name to a dict of candidate file paths:
{ 'SybrGld': {'pre': ['/.../...PreStain_SybrGld.tif'], 'post': ['/.../...PostStain_SybrGld.tif']}, 'DAPI': {'pre': ['/.../...Pre_DAPI.tiff'], 'post': ['/.../...Post_DAPI.tiff']}, }
Each list may have 0, 1, or many entries. Callers decide what to do about duplicates and missing sides.
warnings (list of str) – Human-readable problem descriptions for individual files that couldn’t be classified. One entry per problematic file.
- exo2micro.utils.clear_run_log(output_dir)[source]
Delete the persistent run log file, if it exists.
- exo2micro.utils.diagnose_raw_layout(raw_dir='raw')[source]
Diagnose the layout of a raw image directory and return a structured report.
Catches the common “I don’t see any images” failure modes before the pipeline gets a chance to fail confusingly downstream:
raw_dirdoesn’t exist at all.raw_direxists but is empty.raw_dircontains TIFF files directly (no per-sample folders). This is the most common mistake — users dump all their files in one place instead of separating by sample.raw_dircontains subdirectories but none of them contain any TIFF files.
When the layout looks correct, returns
ok=Trueand a short informational summary. When something is wrong, returnsok=Falseand a multi-linemessageexplaining what’s wrong and how the directory should be structured.- Parameters:
raw_dir (str) – Path to the raw image directory (default
'raw').- Returns:
report – Keys:
ok(bool): True if the layout looks usable.message(str): Human-readable multi-line message. Empty string whenok=Trueand there’s nothing to report.raw_dir(str): The directory that was inspected.exists(bool): Whetherraw_diritself exists.subdirs(list of str): Subdirectory names found (sorted).loose_tiffs(list of str): TIFF filenames found directly inraw_dir(sorted). Non-empty implies the layout is wrong even ifsubdirsis also non-empty.empty_subdirs(list of str): Subdirectory names that contain no TIFF files (sorted). Informational only.
- Return type:
- exo2micro.utils.discover_tasks(samples, dyes, raw_dir='raw')[source]
Resolve a (samples, dyes) request into the actual list of tasks present on disk.
Given a list of sample names and a list of dye names the user wants to process, walk each sample directory and return:
present: list of(sample, dye)tuples that have both a pre-stain and a post-stain file. These are runnable.skipped: list of(sample, dye, reason)tuples that were requested but can’t run, with a short human-readable reason.warnings: list of(sample, warning_str)tuples for filename problems encountered along the way (ambiguous, no underscore, etc. — same wording asclassify_raw_files()returns).
This is the canonical “what tasks should we actually run?” helper. Both the batch processor (
exo2micro.parallel.build_task_list()) and the GUI use it so they share one source of truth.- Parameters:
- Returns:
result – Keys:
present(list of (sample, dye)),skipped(list of (sample, dye, reason)),warnings(list of (sample, warning_str)), andlayout_ok(bool — False ifdiagnose_raw_layout()flagged a fatal layout problem; in that casepresentandskippedwill both be empty and a single layout warning is added towarnings).- Return type:
- exo2micro.utils.equalize_pair(post, pre)[source]
Intensity-equalize a pair of images for registration.
Histogram-matches the pre-stain image’s intensity distribution to the post-stain’s, then jointly normalises both to [0, 1] using the shared 99th percentile of post-stain nonzero pixels.
- Parameters:
post (ndarray) – Post-stain image as float32 (2D).
pre (ndarray) – Pre-stain image as float32 (2D).
- Returns:
post_eq (ndarray) – Post-stain normalised to [0, 1].
pre_eq (ndarray) – Pre-stain histogram-matched and normalised.
- exo2micro.utils.estimate_gauss_sigma(im, down_scale, sparse_threshold=0.1, sparse_sigma=5, dense_sigma=0)[source]
Estimate an appropriate Gaussian pre-smoothing sigma for ECC registration based on image density.
- Parameters:
im (ndarray) – Full-resolution image (2D).
down_scale (float) – Downsample factor that will be applied before ECC.
sparse_threshold (float) – Nonzero pixel fraction below which the image is sparse (default 0.1).
sparse_sigma (float) – Sigma for sparse images at downsampled resolution (default 5).
dense_sigma (float) – Sigma for dense images; 0 disables smoothing (default 0).
- Returns:
Recommended gauss_sigma value.
- Return type:
- exo2micro.utils.estimate_pipeline_memory(sample_dye_pairs, raw_dir='raw', pad=2000, n_workers=1)[source]
Estimate peak RAM required to process a list of (sample, dye) pairs.
Reads only TIFF headers (no pixel data) to get raw image dimensions, inflates by the padding, multiplies by float32 bytes/pixel and by
PEAK_FACTOR_PER_TASKto account for the several full-resolution arrays that coexist at peak. For parallel runs, multiplies byn_workers.The estimate is the worst-case peak across tasks, not the sum, because tasks run sequentially in serial mode (only one task in memory at a time) and concurrently in parallel mode (n_workers tasks at a time, but each could be the worst one).
- Parameters:
- Returns:
estimate – Keys:
peak_bytes(int, worst-case single-task peak),concurrent_peak_bytes(int, that times n_workers),per_task_bytes(list of int, one per pair),warnings(list of str), andn_resolvable(int).- Return type:
- exo2micro.utils.estimate_pipeline_output_size(sample_dye_pairs, raw_dir='raw', pad=2000, save_all_intermediates=False, n_scale_methods=1, checkpoint_format='tiff')[source]
Estimate the on-disk footprint of a pipeline run.
Returns a best-effort estimate of how much disk space the pipeline will consume if run on the given (sample, dye) combinations with the given parameters. Used by the GUI to pre-warn users when the estimate would exceed available disk space.
The estimate is based on the raw TIFF dimensions: exo2micro pads each raw image by
padpixels on every side, converts to float32 (4 bytes per pixel), and saves intermediates at each pipeline stage. Approximate breakdown per (sample, dye):Stage 1: padded post + padded pre (2 files, float32)
Stage 2: ICP-aligned pre (1 file); +coarse-aligned pre if
save_all_intermediates=TrueStage 3: interior-aligned pre (1 file)
Stage 4: difference image (
n_scale_methodsfiles, one per active scale method: Moffat-only = 1, Moffat+percentile = 2, Moffat+manual = 2, all three = 3)
Each intermediate can be written as TIFF, FITS, or both depending on
checkpoint_format. TIFF-only and FITS-only runs use roughly half the disk space of'both'. Diagnostic PNG plots add a small fixed overhead (~10 MB per (sample, dye) regardless).- Parameters:
sample_dye_pairs (list of tuple) – List of
(sample, dye)combinations to estimate.raw_dir (str) – Root raw image directory (default
'raw').pad (int) – Padding value (default
2000).save_all_intermediates (bool) – If True, adds the stage-2 coarse intermediate to the estimate.
n_scale_methods (int) – How many difference images stage 4 will produce (1-3).
checkpoint_format ({'tiff', 'fits', 'both'}) – Which file format(s) each checkpoint gets written as. TIFF and FITS are roughly the same size on disk;
'both'doubles the per-file footprint.
- Returns:
estimate –
- ``{‘bytes_per_task’: [list], ‘total_bytes’: int,
’n_tasks’: int, ‘n_resolvable’: int, ‘warnings’: [list]}``
- Return type:
- exo2micro.utils.filter_nan_gaussian_conserving(arr, sigma)[source]
Apply a Gaussian smooth to an array that may contain NaNs, conserving total intensity. NaN positions remain NaN in the output.
- Parameters:
arr (ndarray) – Input 2D array, may contain NaNs.
sigma (float) – Gaussian smoothing sigma in pixels.
- Returns:
Smoothed array with NaNs preserved.
- Return type:
ndarray
- exo2micro.utils.get_available_memory()[source]
Return available RAM in bytes, or None if psutil unavailable.
- exo2micro.utils.get_run_log_path(output_dir)[source]
Return the path to the persistent run log file.
The log lives at
{output_dir}/.exo2micro_run_log.txt. The leading dot keeps it out of casual file listings since it’s mostly for recovery/debugging, not regular browsing.
- exo2micro.utils.load_checkpoint(filepath)[source]
Load a checkpoint image from TIFF.
- Parameters:
filepath (str) – Base filepath WITHOUT extension (same as passed to save_checkpoint).
- Returns:
image – The loaded image, or None if not found.
- Return type:
ndarray or None
- exo2micro.utils.load_image_pair(sample, dye, raw_dir='raw')[source]
Load a pre-stain and post-stain image pair for a given sample and dye.
Automatically detects which RGB channel carries the fluorescence signal and extracts it at full 8-bit precision, rather than using
PIL.Image.convert()which loses ~41% of the dynamic range.Filename convention
Each sample directory must contain exactly one pre-stain file and exactly one post-stain file per dye, named so that:
The filename ends with
.tifor.tiff(case-insensitive).The basename contains
preorpost(case-insensitive) to mark the stain type.The basename ends with
_<dye>.<ext>, where<dye>matches thedyeargument and contains no underscores.
See
classify_raw_files()for full details and examples.Behaviour on problems
This function is strict: it raises rather than returning placeholder values when anything goes wrong. The exception message is multi-line and tells the user exactly what to fix.
Missing sample directory ->
FileNotFoundErrorNo file matches the requested dye ->
FileNotFoundErrorOnly one side of the pair found ->
FileNotFoundErrorMultiple pre-stain or post-stain files for the same dye ->
ValueError
When other dyes in the same directory are misnamed (ambiguous, no underscore, etc.), warnings about them are printed but do not block loading the requested dye.
- param sample:
Sample name, e.g.
'CD070'. Must match the name of a subdirectory underraw_dir.- type sample:
str
- param dye:
Dye name, e.g.
'SybrGld'or'DAPI'. Must match the substring after the last underscore in the raw filenames.- type dye:
str
- param raw_dir:
Base directory containing sample subdirectories (default
'raw').- type raw_dir:
str
- returns:
post_im (ndarray) – Post-stain image as a 2-D numpy array.
pre_im (ndarray) – Pre-stain image as a 2-D numpy array.
post_path (str) – Path to the post-stain file.
pre_path (str) – Path to the pre-stain file.
- raises FileNotFoundError:
If the sample directory is missing, the requested dye has no matching files, or only one side of the pair exists.
- raises ValueError:
If the requested dye matches more than one pre-stain or post-stain file in the directory.
- exo2micro.utils.make_rgb_overlay(post, pre, post_edges=None, pre_edges=None)[source]
Build a 3-channel RGB overlay for alignment assessment.
Post-stain in Red, pre-stain in Green. Overlap appears yellow. Optional boundary edges drawn in cyan (post) and magenta (pre).
- Parameters:
post (ndarray) – Post-stain image (float32, 2D).
pre (ndarray) – Pre-stain image (float32, 2D).
post_edges (ndarray or None) – Post-stain boundary ring.
pre_edges (ndarray or None) – Pre-stain boundary ring.
- Returns:
rgb – uint8 array of shape (H, W, 3).
- Return type:
ndarray
- exo2micro.utils.normalize_image(image, norm_percentile=None)[source]
Normalize an image to its maximum or to a specified percentile value.
- Parameters:
image (ndarray) – 2D image array.
norm_percentile (float or None) – If None, normalize by the image maximum. Otherwise normalize by this percentile value.
- Returns:
Normalized image.
- Return type:
ndarray
- exo2micro.utils.pad_images(post_im, pre_im, pad=50)[source]
Pad two images with zeros onto a common canvas plus a border.
The extra border gives the registration algorithm room to shift the pre-stain image without it falling off the canvas edge.
- Parameters:
post_im (ndarray) – Post-stain image (2D).
pre_im (ndarray) – Pre-stain image (2D).
pad (int) – Number of zero-padding pixels on each side (default 50).
- Returns:
post_im_pad (ndarray) – Zero-padded post-stain image.
pre_im_pad (ndarray) – Zero-padded pre-stain image on the same canvas.
- exo2micro.utils.preflight_check(sample_dye_pairs, output_dir='processed', raw_dir='raw', pad=2000, n_workers=1, checkpoint_format='tiff', n_scale_methods=1, save_all_intermediates=False, force_run=False)[source]
Combined RAM + disk pre-flight check before a batch or single run.
Estimates both peak RAM (across all concurrent tasks) and total disk output. Compares each against the available headroom on the system and either warns or raises
MemoryError/OSErrordepending on severity.Severity bands (each resource checked independently):
estimate ≤ 80% of available — silent.
80%-100% — print a warning, proceed.
> 100% — raise
MemoryError(RAM) orOSError(disk).
Callers can pass
force_run=Trueto downgrade the hard fail to a warning. Useful when the estimate is known to be conservative or when the user has already cleared other processes.- Parameters:
output_dir (str) – Where checkpoints will be written. Free disk space is measured at this path.
raw_dir (str) – Source raw images, needed to read TIFF dimensions.
pad (int) – Padding parameter.
n_workers (int) – Concurrent workers.
1for serial.checkpoint_format – Passed through to
estimate_pipeline_output_size().n_scale_methods – Passed through to
estimate_pipeline_output_size().save_all_intermediates – Passed through to
estimate_pipeline_output_size().force_run (bool) – If True, hard-fail conditions are downgraded to warnings.
- Raises:
MemoryError – When the RAM estimate exceeds 100% of available and
force_run=False.OSError – When the disk estimate exceeds 100% of free space and
force_run=False.
- exo2micro.utils.read_run_log_tail(output_dir, max_lines=500)[source]
Read the tail of the persistent run log.
- Parameters:
- Returns:
text – The last
max_lineslines of the file, joined into a single string, or None if the file doesn’t exist.- Return type:
str or None
- exo2micro.utils.robust_vmax(im, n_mad=5)[source]
Compute a display vmax robust to bright outliers.
Uses median + n_mad * MAD over nonzero pixels.
- exo2micro.utils.save_checkpoint(image, filepath, sample='', dye='', stage='', params=None, extra_headers=None)[source]
Save an intermediate image as both TIFF and FITS, with metadata.
The TIFF is saved in the ‘tiff/’ subdirectory and the FITS in the ‘fits/’ subdirectory of the same parent.
- Parameters:
image (ndarray) – 2D image array to save.
filepath (str) – Base filepath WITHOUT extension, e.g. ‘processed/CD070/SybrGld_microbe/01_padded_post’. The function appends .tiff and .fits and places them in the appropriate subdirectories.
sample (str) – Sample name for FITS header.
dye (str) – Dye name for FITS header.
stage (str) – Pipeline stage name for FITS header.
params (dict or None) – Non-default parameters to record in FITS header.
extra_headers (dict or None) – Additional FITS header keywords (e.g., warp matrix elements).
- exo2micro.utils.subtract_median(image, region=(0, 5000, 0, 5000))[source]
Subtract the median background level estimated from a rectangular region.
- Parameters:
image (ndarray) – 2D image array.
region (tuple of 4 ints) – (row_min, row_max, col_min, col_max) region for background estimation.
- Returns:
Background-subtracted image.
- Return type:
ndarray
- exo2micro.utils.survey_raw_channels(raw_dir='raw', crop_size=1000)[source]
Survey all raw TIFF files to report which RGB channels carry signal.
Reads a small centre crop from each file to avoid loading full images into memory.
- Parameters:
- Returns:
results – One entry per file with keys: ‘path’, ‘size’, ‘mode’, ‘channels’. ‘channels’ is a dict mapping channel name (‘R’, ‘G’, ‘B’ or ‘gray’) to {‘max’: int, ‘mean’: float, ‘nonzero’: int}.
- Return type:
Notes
If
raw_diris missing, empty, or has TIFFs in the wrong place (e.g. directly inraw_dirrather than in per-sample subdirectories), this function prints a human-readable layout diagnosis viadiagnose_raw_layout()and returns an empty list.
- exo2micro.utils.tiff_to_fits(tiff_file, return_data=False)[source]
Convert a three-channel RGB TIFF file to a FITS file.
Each colour channel is stored as a named image extension (RED1, GREEN2, BLUE3).
- exo2micro.utils.tifffile_save(image, path)[source]
Save image as TIFF using tifffile for full-precision support.
- exo2micro.utils.trim_to_signal(post_im, pre_im, threshold=0)[source]
Trim both images to the bounding box of their combined nonzero signal.
Discards large empty margins before padding and registration. This is critical when images have significant zero-padded borders, because those empty regions confuse phase correlation and ECC.
- Parameters:
post_im (ndarray) – Post-stain image (2D).
pre_im (ndarray) – Pre-stain image (2D).
threshold (float) – Pixel values <= this are treated as empty background (default 0).
- Returns:
post_trimmed (ndarray)
pre_trimmed (ndarray)
bbox (tuple) – (row_min, row_max, col_min, col_max) bounding box applied.