Composition Module

Compositional data analysis utilities tailored for nuee.

This module provides NumPy-only adaptations of scikit-bio’s composition utilities so they can be used in environments where SciPy is unavailable (for example Pyodide). The implementations are derived from the scikit-bio project (Modified BSD License) with minimal adjustments for integration in nuee.

nuee.composition.closure(mat: ndarray, *, out: ndarray | None = None) ndarray[source]

Perform closure so that each composition sums to 1.

Parameters:
  • mat – A matrix where rows are compositions and columns are components.

  • out – Optional array where the result is stored.

Returns:

Matrix of proportions with non-negative entries that sum to 1 per row.

Return type:

numpy.ndarray

nuee.composition.multiplicative_replacement(mat: ndarray, delta: float | None = None) ndarray[source]

Replace structural zeros with a small non-zero value.

nuee.composition.power(x: ndarray, a: float) ndarray[source]

Raise each component to a power and renormalise via closure.

nuee.composition.clr(mat: ndarray, ignore_zero: bool = False) ndarray[source]

Compute the centred log-ratio transformation.

nuee.composition.clr_inv(mat: ndarray) ndarray[source]

Inverse centred log-ratio transformation.

nuee.composition.inner(mat: ndarray) ndarray[source]

Compute the inner product matrix in the Aitchison simplex.

nuee.composition.ilr(mat: ndarray, basis: ndarray | None = None) ndarray[source]

Perform the isometric log-ratio transformation.

nuee.composition.ilr_inv(mat: ndarray, basis: ndarray | None = None) ndarray[source]

Inverse isometric log-ratio transformation.

nuee.composition.alr(mat: ndarray, denominator_idx: int = -1) ndarray[source]

Perform the additive log-ratio transformation.

nuee.composition.alr_inv(mat: ndarray, denominator_idx: int = -1) ndarray[source]

Inverse additive log-ratio transformation.

nuee.composition.sbp_basis(sbp: ndarray) ndarray[source]

Construct an orthonormal basis from a sequential binary partition.

nuee.composition.center(mat: ndarray) ndarray[source]

Alias for centralize() kept for API parity.

nuee.composition.centralize(mat: ndarray) ndarray[source]

Center compositions by their geometric mean.

nuee.composition.replace_zeros(X: ndarray | DataFrame, detection_limits: ndarray | None = None, delta: float | None = None) ndarray | DataFrame[source]

Multiplicative zero replacement for compositional data.

Replaces zeros with a small value proportional to the detection limit (or column minimum of non-zero values) and adjusts non-zero entries so that each row sum is preserved exactly.

Parameters:
  • X (array-like or DataFrame, shape (n, D)) – Compositional data matrix. Zeros mark below-detection-limit values.

  • detection_limits (array-like of shape (D,), optional) – Per-component detection limits. When None, the column-wise minimum of strictly positive values is used as a proxy.

  • delta (float, optional) – Fraction of the detection limit used as the replacement value. Default is 0.65 (Martín-Fernández et al. 2003).

Returns:

Data with zeros replaced. Row sums match the input exactly.

Return type:

numpy.ndarray or DataFrame

References

nuee.composition.impute_missing(X: ndarray | DataFrame, method: str = 'lrEM', max_iter: int = 100, tol: float = 0.0001, random_state: int | None = None) ndarray | DataFrame[source]

Impute missing values in compositional data using the lrEM algorithm.

Uses the ALR (additive log-ratio) EM algorithm of Palarea-Albaladejo & Martín-Fernández (2008), matching the approach in R’s zCompositions package. Observed values are preserved exactly in the output.

Ideally one column should be fully observed (no NaN values) to serve as the ALR denominator. When no column is complete, the column with the fewest missing values is chosen and its gaps are pre-filled using row-proportional estimation from column-mean ratios before running EM.

Parameters:
  • X (array-like or DataFrame, shape (n, D)) – Compositional data matrix. NaN marks missing components. Observed (non-NaN) values must be strictly positive.

  • method ({"lrEM", "lrDA"}, default "lrEM") – "lrEM" returns the conditional expectation (deterministic). "lrDA" adds noise from the conditional covariance for multiple imputation / data augmentation.

  • max_iter (int, default 100) – Maximum number of EM iterations.

  • tol (float, default 1e-4) – Convergence tolerance on the relative change of the log-likelihood.

  • random_state (int, optional) – Seed for the random number generator (only used when method=”lrDA”).

Returns:

Completed data. Observed values are unchanged; imputed values are scaled consistently with the original observed components.

Return type:

numpy.ndarray or DataFrame

References