growthcurves.preprocessing module#

Data preprocessing utilities for growth curve analysis.

This module provides functions for common preprocessing steps such as blank subtraction and path length correction.

growthcurves.preprocessing.blank_subtraction(N: ndarray, blank: ndarray) → ndarray[source]#

Subtract blank values from time data series of growth measurements.

Performs element-wise subtraction of blank measurements from measurements. This is commonly used for baseline correction in optical density measurements.

Parameters:

N (numpy.ndarray) – Data series to be corrected (e.g., OD measurements)
blank (numpy.ndarray) – Blank/background measurements to subtract. Must be the same length as N, or a scalar value to subtract from all N points.

Returns:

Blank-subtracted N

Return type:

numpy.ndarray

Examples

>>> import numpy as np
>>> N = np.array([0.5, 0.6, 0.7, 0.8])
>>> blank = np.array([0.05, 0.05, 0.05, 0.05])
>>> corrected = blank_subtraction(N, blank)
>>> corrected
array([0.45, 0.55, 0.65, 0.75])

growthcurves.preprocessing.detect_outliers(N: ndarray, method: str = 'ecod', **kwargs) → ndarray[source]#

Detect outliers in a growth curve time series.

Entry point that dispatches to the chosen detection method. All methods return a boolean mask of the same length as N.

Parameters:

N (numpy.ndarray) – Input time series of OD values.
method ({"iqr", "ecod", "hampel"}, default="ecod") –
Outlier detection method to use:
- "iqr" — sliding-window IQR method (detect_outliers_iqr()). Kwargs: window_size (int, required), factor (float, default 1.5).
- "ecod" — ECOD method (detect_outliers_ecod()). Kwargs: factor (float, default 3.5).
- "hampel" — Hampel identifier (detect_outliers_hampel()). Kwargs: window (int, default 15), factor (float, default 3.0).
**kwargs – Additional keyword arguments forwarded to the chosen method.

Returns:

Boolean mask of the same length as N where True indicates an outlier.

Return type:

numpy.ndarray

Raises:

ValueError – If method is not recognised.

Examples

>>> mask = detect_outliers(N, method="iqr", window_size=11, factor=1.5)
>>> mask = detect_outliers(N, method="ecod", factor=3.5)
>>> mask = detect_outliers(N, method="hampel", window=15, factor=3.0)

growthcurves.preprocessing.detect_outliers_ecod(N: ndarray, factor: float = 3.5) → ndarray[source]#

Return a boolean array indicating whether each value is an outlier using ECOD (Empirical Cumulative Distribution-based Outlier Detection).

Builds a 3-feature matrix per point — absolute rolling-mean residual, raw OD value, and first difference — then fits ECOD and flags points whose MAD z-score of the decision score exceeds factor.

Parameters:

N (numpy.ndarray) – Input time series of OD values.
factor (float, default=3.5) – MAD z-score threshold for flagging outliers. Higher values flag fewer, more extreme points.

Returns:

Boolean mask of the same length as N where True indicates an outlier.

Return type:

numpy.ndarray

growthcurves.preprocessing.detect_outliers_hampel(N: ndarray, window: int = 15, factor: float = 3.0) → ndarray[source]#

Return a boolean array indicating whether each value is an outlier using the Hampel identifier.

For each point, computes the median and MAD of a symmetric neighbourhood window and flags points whose MAD z-score exceeds factor.

Parameters:

N (numpy.ndarray) – Input time series of OD values.
window (int, default=15) – Total number of neighbours to include (window // 2 on each side).
factor (float, default=3.0) – MAD z-score threshold. Points with score > factor are flagged.

Returns:

Boolean mask of the same length as N where True indicates an outlier.

Return type:

numpy.ndarray

growthcurves.preprocessing.detect_outliers_iqr(N: array, window_size: int, factor: float = 1.5) → array[source]#

Return a boolean array indicating whether each value is an outlier based on the IQR method.

The sliding window size gives for the middle values the central window points IQR status. For the first and last points in data series N, the first and last point in window is respectively used instead of the center to label a value as an outlier.

growthcurves.preprocessing.out_of_iqr_window(values: ndarray, factor: float = 1.5, position: str = 'center') → bool[source]#

Return True if the selected value is an outlier based on the IQR method.

Parameters:

values (numpy.ndarray) – Input window of values.
factor (float, default=1.5) – IQR multiplier used to define outlier bounds.
position ({"center", "first", "last"}, default="center") – Which value in the window to test as the target point.

Raises:

ValueError – If position is invalid. If position=”center” and the input array does not have an odd number of elements.

growthcurves.preprocessing.path_correct(N: ndarray, path_length_cm: float) → ndarray[source]#

Correct optical density measurements to a standard 1 cm path length.

Normalizes OD measurements taken at a specific path length to what they would be at a 1 cm path length using Beer-Lambert law (OD is proportional to path length).

Parameters:

N (numpy.ndarray) – Optical density measurements to correct
path_length_cm (float) – Actual path length of the measurement in centimeters (must be > 0)

Returns:

Path-corrected N normalized to 1 cm path length

Return type:

numpy.ndarray

Raises:

ValueError – If path_length_cm is not positive

Examples

>>> import numpy as np
>>> # Measurement taken with 0.5 cm path length
>>> N = np.array([0.25, 0.30, 0.35])
>>> corrected = path_correct(N, path_length_cm=0.5)
>>> corrected  # OD values as if measured with 1 cm path
array([0.5, 0.6, 0.7])

>>> # Measurement taken with 2 cm path length
>>> N = np.array([1.0, 1.2, 1.4])
>>> corrected = path_correct(N, path_length_cm=2.0)
>>> corrected
array([0.5, 0.6, 0.7])

Notes

The correction uses the relationship: OD_1cm = OD_measured / path_length

This assumes the Beer-Lambert law holds (linear relationship between absorbance and path length), which is typically valid for OD < 1.0-1.5.

growthcurves.preprocessing module

Contents

growthcurves.preprocessing module#