Non-metric Multidimensional Scaling with automatic transformation.
This function provides a high-level interface for NMDS ordination, following
the conventions of the R vegan package’s metaMDS function. It automatically
handles data transformation and uses multiple random starts to find the best
ordination solution.
Parameters:
X (np.ndarray or pd.DataFrame) – Community data matrix with samples in rows and species in columns.
Values should be non-negative abundances or counts.
k (int, default=2) – Number of dimensions for the ordination. Common choices are 2 or 3.
distance (str, default="bray") – Distance metric to use for calculating dissimilarities.
See nuee.vegdist() for available options.
trymax (int, default=20) – Maximum number of random starts to find the best solution.
Higher values increase computation time but may find better solutions.
maxit (int, default=200) – Maximum number of iterations for each random start.
trace (bool, default=False) – If True, print progress information including stress values.
autotransform (bool, default=True) – If True, automatically apply square root transformation to abundance data
(values > 1) to reduce the influence of dominant species.
wascores (bool, default=True) – If True, calculate weighted average species scores based on site scores
and species abundances.
expand (bool, default=True) – If True, expand the result to include additional information.
random_state (int, optional) – Seed used for reproducible random starts. If None, each run is
initialised independently.
**kwargs (dict) – Additional parameters passed to the NMDS class.
NMDS is a rank-based ordination method that attempts to preserve the rank
order of dissimilarities between samples. Unlike metric methods like PCA,
NMDS makes no assumptions about the linearity of relationships.
Stress values provide a measure of fit:
- < 0.05: excellent
- 0.05 - 0.10: good
- 0.10 - 0.20: acceptable
- > 0.20: poor (consider increasing k or using a different method)
The function uses multiple random starts (trymax) because NMDS can get stuck
in local optima. The solution with the lowest stress is returned.
Non-metric Multidimensional Scaling for community ecology.
This implementation follows the approach used in vegan’s metaMDS function,
including multiple random starts and stress evaluation. NMDS is a rank-based
ordination method that attempts to represent ecological distances in a
reduced dimensional space.
Parameters:
n_components (int, default=2) – Number of dimensions for the embedding (ordination axes).
max_iter (int, default=200) – Maximum number of iterations for the optimization algorithm.
n_init (int, default=20) – Number of random initializations. The solution with minimum stress
is returned.
eps (float, default=1e-3) – Convergence tolerance for the stress value.
random_state (int, optional) – Random seed for reproducibility. If None, the random state is not fixed.
dissimilarity (str, default="bray") – Distance metric to use. See nuee.vegdist() for available options.
n_jobs (int, optional) – Number of parallel jobs for computation. If None, uses a single core.
High-level interface with automatic transformations
MDS
Scikit-learn’s MDS implementation
Notes
NMDS is particularly useful when:
- Relationships between samples are non-linear
- You want to use a specific distance metric
- You have presence/absence or abundance data
The stress value indicates goodness-of-fit:
- Values < 0.05 indicate excellent fit
- Values 0.05-0.10 indicate good fit
- Values > 0.20 indicate poor fit
Canonical Correspondence Analysis (or CA when X is None).
Parameters:
Y – Species data matrix (sites x species).
X – Environmental data matrix (sites x variables) or DataFrame for
formula evaluation. When None and no formula is given, an
unconstrained Correspondence Analysis (CA) is performed.
formula – R-style formula string referencing columns in X.
For PCA the species loadings are shown as arrows from the origin.
For constrained ordination (RDA / CCA), environmental variables
are shown as arrows while species are shown as points.