lambda_corr introduces and implements the Repeated-Average Rank Correlation Λ (Lambda), a new family of robust, symmetric, and asymmetric measures of monotone association
based on pairwise rank slopes. Compared with traditional rank-based measures (Spearman’s ρ and Kendall’s τ [1,2]), Lambda is:
- Substantially more resistant to noise and outliers (see /results/*Robustness*.png).
|
Robustness of |
- Much less biased relative to Pearson’s r [3] linear correlation (see /results/*bias*.png).
|
Bias of |
- Competitive or superior in accuracy, especially for moderate–strong signals (see /results/*accuracy*.png).
|
Accuracy of |
- Competitive in efficiency, for moderate–strong signals. Slightly less efficient asymptotically (~81% vs. ~91% for ρ and τ) for the null. See /results/*efficiency*.png and /results/*power*.png
|
Efficiency of |
(code for figures is in /tests/test_lambdacorr2.py )
The canonical statistic,
-
Kendall’s
$\mathbf{\tau_b}$ can be written as the signed geometric mean of Somers’ D(y|x) and D(y|x); -
Pearson’s r is the signed geometric mean of the two OLS slopes
$m_{Y\mid X} = \dfrac{\mathrm{cov}(x,y)}{\mathrm{var}(x)}$ and$m_{x\mid y} = \dfrac{\mathrm{cov}(x,y)}{\mathrm{var}(y)}$ ; -
Spearman’s
$\mathbf{\rho}$ has the same construction as Pearson's applied to the rank-transformed variables ($r_x$ ,$r_y$ ).
Given paired samples
- Compute average ranks:
Replace the raw
where ties are assigned their average (mid) rank.
- Standardize ranks to zero mean / unit variance:
Standardization doesn't affect
- Compute median slope in rank space at each sample i:
- Compute the asymmetric rank-slope correlations as the outer mean over i slopes:
- Λ(y|x):
- Λ(x|y): repeat with x and y swapped.
- Apply a fold-back transform to the asymmetric components enforcing the range [-1, 1], and restoring the correct ordering relative to τ/ρ, for extremely rare, highly structured near-(anti)monotone rank configurations (see Fold-Back Transform section below):
That is equivalent to:
- Define the symmetric
$\mathbf{\Lambda_s}$ using the classical signed geometric mean:
If the asymmetric signs disagree,
The mean-of-medians construction can very rarely produce
Within this overshoot regime, larger
In the Monte Carlo calibration runs used for the null Beta-mixture fits (for p-values) and the bivariate-Gaussian benchmarks, fold-back was never activated (zero occurrences in billions of draws). Therefore, it had no effect on the calibrated null distribution or benchmark results.
Alternative stabilizations (e.g., Harrell–Davis quantile estimator per anchor, or Monte Carlo/permutation-based bias correction) can only reduce overshoot frequency and magnitude, but they materially change Λ and its null behavior; fold-back is used as a simple, deterministic guardrail.
Examples of Overshoot Behavior
Shown are rank configurations that produce the largest observed untransformed value of the symmetric statistics for different sample sizes (found via stochastic annealing rank swap search). Listed in the legend are the
-
Range:
$\mathbf{\Lambda_s}$ ∈ ([-1,1]). -
Symmetric:
$\mathbf{\Lambda_s}(x,y)$ ==$\mathbf{\Lambda_s}(y,x)$ . -
Invariant under strictly monotone transforms:
$\mathbf{\Lambda_s}(x,y)$ is unchanged under$x \mapsto f(x)$ or$y \mapsto g(y)$ for any strictly monotone functions$f, g$ . - Robust: Very robust to outliers and noise; extremely high sign-breakdown point (median-of-slopes core) with adversarial contamination (see /results/*Robustness*.png).
- Less biased: Much less biased than Spearman or Kendall relative to Pearson (see /results/*bias*.png).
- Accurate: Competitive or superior in accuracy for moderate–strong signals.
-
Efficiency: Asymptotic efficiency ~81% (ρ, τ ≈ 91%) with var_opt/var(
$\mathbf{\Lambda_s}$ ) = (1/N)/(1.112^2/N). (Siegel median of medians slope is ~41%). See /results/*efficiency*.png and /results/*power*.png - Null distribution: centered, symmetric, slightly heavier tails than Spearman. Beta-mixture null model for |Λ_s| with point masses at 0 and ±1 (Beta on (0,1) and a mirrored Beta on (-1, 0)).
-
A fully repeated-median Λ has maximal robustness but reduced asymptotic efficiency, while the mean-of-medians
$\mathbf{\Lambda_s}$ recovers much of the efficiency at minimal loss of breakdown. -
A mean-of-means Λ is Theil-Sen in rank-space and is essentially Spearman in both efficiency and null spread, but gives up most of the robustness advantage compared to the mean of medians.
-
Continuum of Λ variants' behavior (outside loop - inside loop):
Spearman (ρ) ≈
$\mathbf{\Lambda_s}^{(mean-mean)}$ <->$[\mathbf{\Lambda_s}^{(mean-median)}]$ <->$\mathbf{\Lambda_s}^{(median-mean)}$ <->$\mathbf{\Lambda_s}^{(median-median)}$ ≈ Siegel's slopeCanonical choice:
$\mathbf{\Lambda_s}^{(mean-median)}$ — best efficiency/robustness balance (especially at low statistics).
lambda_corr supports three p-value modes. In all cases, if ties=False and n ≤ 10, an exact lookup table is used for the symmetric statistic ptype.
P-values for the asymmetric components (
- Changes behavior based on ties keyword.
-
ties=True → Monte Carlo permutation test. P-values for
$\mathbf{\Lambda_s}$ ,$\mathbf{\Lambda_{xy}}$ ,$\mathbf{\Lambda_{yx}}$ are returned. -
n ≤ 10 → Exact p-value for
$\mathbf{\Lambda_s}$ ifties=False(default); otherwise Monte Carlo permutation test. -
n > 10 and
ties=False→ Beta-mixture null model approximation for Λ_s. Asymmetric p-values are NaN.
- Uses a Monte Carlo permutation test (all
n). Special case: ifties=Falseandn ≤ 10,$\mathbf{\Lambda_s}$ uses the exact lookup table. - Returns p-values for
$\mathbf{\Lambda_s}$ ,$\mathbf{\Lambda_{xy}}$ ,$\mathbf{\Lambda_{yx}}$ . - Valid with ties or arbitrary marginals (conditional null; see below).
- Early stopping when p-uncertainty <
p_tol. - This calculation is stochastic, so permutation p-values vary across runs. Re-running can help the user gauge Monte Carlo uncertainty, if desired.
-
n ≤ 10 → Exact lookup table for
$\mathbf{\Lambda_s}$ . This is only an exact p-value if there are no ties. -
n > 10 → Fast approximate p-values for
$\mathbf{\Lambda_s}$ . - Directional components Λ_xy and Λ_yx are returned as NaN as they require permutation for valid p-values.
- Assumes no ties; accuracy degrades as tie frequency increases.
- Approximate p-value from an n-dependent Beta-mixture unconditional null for |$\mathbf{\Lambda_s}$| with point masses at 0 and ±1 and a Beta fit on (0,1). Model parameters (p0(n), p1(n), α(n), β(n)) are calibrated from extremely large Monte Carlo null simulations (
n>11) at increasing sample sizes, parametrically interpolated (n>30) for intermediate values, and extrapolated for large samples (n>1000).
The permutation test samples from the conditional null distribution, generated by permuting the observed y values while keeping x fixed. This distribution depends directly on the observed marginals and tie structure. Therefore, when the underlying population is genuinely discrete, the permutation method can be more accurate because it automatically reflects the correct amount and pattern of ties.
In contrast, the approximate p-values target an unconditional null distribution for Λ_s, calibrated from extremely large Monte Carlo simulations under continuous no-tie assumptions. As a result, they tend to be more stable (and often more accurate) for moderate–large n, especially when the underlying population is continuous (even if the sample exhibits ties due to rounding, censoring, or finite precision).
Repeated points for emphasis:
p_xyandp_yxare returned only when a permutation test is run; otherwise they are NaN.- In
ptype="perm"withties=Falseandn ≤ 10, the code still runs permutations, butp_sis replaced by the exact lookup value. ptype="approx"assumes no ties; if ties are present, results may be biased (especially for smalln).
| Condition | ptype="default" |
ptype="approx" |
ptype="perm" |
|---|---|---|---|
ties=True, n ≤ 10 |
permutation (p_s, p_yx, p_xy) | table p_s (not exact); p_yx/p_xy = NaN | permutation (p_s, p_yx, p_xy) |
ties=True, n > 10 |
permutation (p_s, p_yx, p_xy) | Beta-mixture p_s; p_yx/p_xy = NaN | permutation (p_s, p_yx, p_xy) |
ties=False, n ≤ 10 |
exact p_s; p_yx/p_xy = NaN | exact p_s; p_yx/p_xy= NaN | exact p_s; permutation (p_yx, p_xy) |
ties=False, n > 10 |
Beta-mixture p_s; p_yx/p_xy = NaN | Beta-mixture p_s; p_yx/p_xy = NaN | permutation (p_s, p_yx, p_xy) |
Lambda_s, p_s, Lambda_yx, p_yx, Lambda_xy, p_xy, Lambda_a
Where:
-
$\mathbf{\Lambda_s}$ — symmetric correlation. - Λ(y|x) / Λ(x|y) — asymmetric directional correlations.
-
p-values correspond to the chosen
alt = {"two-sided","greater","less"}. -
$\mathbf{\Lambda_a}$ — normalized asymmetry index with range [0, 1].
with
The library targets Python 3.8+ and uses NumPy and Numba for speed.
#Install lambda-corr from pypi with pip
pip install lambda-corr
#Or local install from source
pip install -e .
#Install optional test dependencies (SciPy)
pip install -e .[tests]
#Prerequisites if necessary
pip install numba numpy
#Optional: statistical tests make use of SciPy
pip install scipy
#Optional: for Numba fast math optimizations on Intel CPUs
pip install icc_rt
Requirements:
- Python ≥ 3.8
- NumPy ≥ 1.23
- Numba ≥ 0.61
- SciPy ≥ 1.9 (only needed for some validation tests)
Compute the symmetric Lambda correlation
import numpy as np
import math
from lambda_corr import lambda_corr
rng = np.random.default_rng(seed=0)
n = 50
rho = 0.5 # correlation strength
x = rng.standard_normal(n)
z = rng.standard_normal(n)
c = math.sqrt((1 - rho) * (1 + rho))
y = np.exp(rho * x + c * z) # any monotonic transformation
# Compute Lambda correlations
Lambda_s, p_s, Lambda_yx, p_yx, Lambda_xy, p_xy, Lambda_a = lambda_corr(x, y)
#or
#Lambda_s, p_s, Lambda_yx, p_yx, Lambda_xy, p_xy, Lambda_a = lambda_corr_nb(x, y, y.size)
#for inside Numba @njit functions
# Nicely formatted output
print(f"Λ_s = {Lambda_s: .4f} (p = {p_s: .4g})")
print(f"Λ(y|x) = {Lambda_yx: .4f} (p = {p_yx: .4g})")
print(f"Λ(x|y) = {Lambda_xy: .4f} (p = {p_xy: .4g})")
print(f"Asymmetry = {Lambda_a: .4f}")
# Example output:
# Λ_s = 0.4130 (p = 0.0087) #Result will be close to rho
# Λ(y|x) = 0.4145 (p = 0.008419)
# Λ(x|y) = 0.4114 (p = 0.008988)
# Asymmetry = 0.0038Code in example/ shows how to apply my Telescope Array analysis,
“Evidence for a Supergalactic Structure of Magnetic Deflection Multiplets of Ultra-high-energy Cosmic Rays”
(arXiv:2005.07312v2) to Pierre Auger Observatory public data using the
[1] Spearman, C. The proof and measurement of association between two things. American Journal of Psychology, 15(1), 72–101, 1904.
[2] Kendall, M.G., Rank Correlation Methods (4th Edition), Charles Griffin & Co., 1970.
[3] https://en.wikipedia.org/wiki/Pearson_correlation_coefficient
[4]Siegel, A.F., Robust Regression Using Repeated Medians, Biometrika, Vol. 69, pp. 242-244, 1982.
If you use lambda_corr in academic or scientific work, please cite:
Lundquist, J.P. lambda_corr: Robust Repeated-Average Rank Correlation Λ (Lambda).
GitHub repository: https://github.com/JonPaulLundquist/lambda_corr@misc{lundquist2025lambda_corr,
author = {Lundquist, Jon Paul},
title = {lambda\_corr: Robust Repeated-Average Rank Correlation (Λ)},
year = {2025},
publisher = {GitHub},
howpublished = {\url{https://github.com/JonPaulLundquist/lambda_corr}},
note = {Version X.Y.Z. Accessed: YYYY-MM-DD}
}




