User Guide¶

This guide provides detailed information on using nuee for community ecology analysis.

Overview ¶

nuee is designed to provide a Pythonic interface to community ecology analyses, following the conventions of the R vegan package while leveraging the power of the scientific Python ecosystem.

Data Format ¶

Community Data ¶

Community data should be provided as a matrix where:

Rows represent samples (sites, plots, etc.)
Columns represent species (taxa, OTUs, etc.)
Values represent abundances (counts, biomass, etc.)

nuee accepts data in several formats:

>>> import nuee
>>> import numpy as np
>>> import pandas as pd

>>> # NumPy array
>>> data_array = np.random.rand(10, 20)  # 10 samples, 20 species

>>> # Pandas DataFrame (recommended)
>>> data_df = pd.DataFrame(
...     data_array,
...     index=[f"Site{i}" for i in range(10)],
...     columns=[f"Species{i}" for i in range(20)]
... )

Environmental Data ¶

Environmental data should have the same number of rows as the community data:

>>> env_data = pd.DataFrame({
...     "Temperature": np.random.rand(10),
...     "pH": np.random.rand(10),
...     "Moisture": np.random.rand(10)
... }, index=data_df.index)

Distance Matrices ¶

Distance matrices should be square, symmetric matrices:

>>> # Calculate distances
>>> distances = nuee.vegdist(data_df, method="bray")

nuee.metaMDS follows vegan’s data transformations and SMACOF optimisation, but the underlying implementation is still evolving. Recent regression tests show small differences in the reported stress compared to vegan::metaMDS. This does not invalidate the ordination, but if you require vegan-identical results you should re-run the analysis in R for the time being.

Diversity Analysis Workflow ¶

>>> import nuee
>>> import pandas as pd

>>> # Load data
>>> species = nuee.datasets.BCI()

>>> # Calculate multiple diversity indices
>>> diversity_df = pd.DataFrame({
...     "Shannon": nuee.shannon(species).values,
...     "Gini-Simpson": nuee.simpson(species).values,
...     "Richness": nuee.specnumber(species).values,
...     "Fisher": nuee.fisher_alpha(species).values
... })

>>> # Summary statistics
>>> print(diversity_df.describe())
[...]

>>> # Compare groups
>>> # If you have grouping information
>>> # diversity_by_group = diversity_df.groupby(groups).mean()

Hypothesis Testing Workflow ¶

>>> import nuee

>>> # Load data
>>> species = nuee.datasets.dune()
>>> env = nuee.datasets.dune_env()

>>> # Calculate distances
>>> dist = nuee.vegdist(species, method="bray")

>>> # Test for group differences (PERMANOVA)
>>> perm_result = nuee.adonis2(dist, env['Management'])
>>> print(f"R^2: {perm_result.R2.iloc[0]:.3f}")
R^2: 0.342
>>> print(f"p-value: {perm_result['Pr(>F)'].iloc[0]:.3f}")
p-value: ...

>>> # Test for homogeneity of dispersions
>>> betadisp = nuee.betadisper(dist, env['Management'])
>>> print(betadisp)

Tips and Best Practices ¶

Choosing an Ordination Method ¶

NMDS: Robust, works with any distance metric, no linearity assumptions
RDA: Linear relationships, environmental variables available
CCA: Unimodal relationships, long environmental gradients
PCA: Quick exploration, linear relationships

Choosing a Distance Metric ¶

Bray-Curtis: General purpose, abundance data
Jaccard: Presence/absence data
Euclidean: Environmental data, PCA
Hellinger: Before RDA, avoids double-zero problem

Data Transformation ¶

>>> import numpy as np

>>> # Hellinger transformation (for RDA)
>>> def hellinger(x):
...     row_sums = x.sum(axis=1, keepdims=True)
...     return np.sqrt(x / row_sums)

>>> # Wisconsin double standardization
>>> def wisconsin(x):
...     # By species maxima
...     x_std = x / x.max(axis=0)
...     # By site totals
...     x_std = x_std / x_std.sum(axis=1, keepdims=True)
...     return x_std

Compositional Data Workflows ¶

nuee.composition brings compositional data analysis tools into the package without requiring SciPy. These utilities are NumPy-only ports of scikit-bio’s composition module.

>>> from nuee import composition
>>> import numpy as np

>>> # Raw counts with zeros
>>> counts = np.array([[0, 5, 10], [3, 0, 9]])

>>> # Replace zeros and apply closure
>>> replaced = composition.multiplicative_replacement(counts)
>>> closed = composition.closure(replaced)

>>> # Transform to log-ratio space
>>> clr_coords = composition.clr(closed)
>>> ilr_coords = composition.ilr(closed)

>>> # Invert transforms if required
>>> recovered = composition.ilr_inv(ilr_coords)

< 0.05: Excellent representation
0.05 - 0.10: Good representation
0.10 - 0.20: Acceptable
> 0.20: Poor (try different k or method)

RDA/CCA Interpretation ¶

Eigenvalues: Variance explained by each axis
Species scores: Optimal position for each species
Site scores: Position of each site
Environmental vectors: Direction and strength of correlation

PERMANOVA Results ¶

R^2: Proportion of variance explained
F-statistic: Ratio of between-group to within-group variance
p-value: Significance (typically alpha = 0.05)

User Guide¶

Overview ¶

Data Format ¶

Community Data ¶

Environmental Data ¶

Distance Matrices ¶

Workflow Examples ¶

Basic Ordination Workflow ¶

Diversity Analysis Workflow ¶

Hypothesis Testing Workflow ¶

Tips and Best Practices ¶

Choosing an Ordination Method ¶

Choosing a Distance Metric ¶

Data Transformation ¶

Compositional Data Workflows ¶

Mathematical Definitions ¶

Shannon Diversity ¶

Gini-Simpson Diversity ¶

Bray-Curtis Dissimilarity ¶

Hellinger Transformation ¶

PERMANOVA F-statistic ¶

Interpretation Guidelines ¶

NMDS Stress Values ¶

RDA/CCA Interpretation ¶

PERMANOVA Results ¶

nuee

Navigation

Related Topics