Numerics and Stability¶
This page summarizes the numerical practices used throughout ExactStep to ensure stable, precise, and fast closed-form solves.
- Float dtype: computations use float64 by default.
- Whitening: whiten input features to reduce condition numbers.
- Small L2: add tiny ℓ2 regularization to stabilize inverses when needed.
- SPD solves: prefer Cholesky-based solvers with robust fallbacks.
- Tight tolerances: tests assert near machine precision where appropriate.
Whitening¶
Whitening improves conditioning by zero-centering and scaling each feature to unit variance. Use an epsilon floor to avoid division by near-zero std.
import numpy as np
from transformer_instant import compute_whitener, apply_whitener
X = np.random.randn(256, 32)
mu, inv_std = compute_whitener(X, eps=1e-12)
Xw = apply_whitener(X, mu, inv_std)
Recommended epsilon: 1e-12.
Small ℓ2 regularization¶
Add a small λ to stabilize inverses without perceptible bias. Good defaults:
- λ in [1e-6, 1e-3]
- λ = 0 for exact interpolation in controlled tests
Examples:
from transformer_instant import ridge_fit_closed_form
W = ridge_fit_closed_form(H, Y, lambda_reg=1e-3)
from transformer_instant import krr_fit, krr_predict, compute_whitener, apply_whitener
mu, inv_std = compute_whitener(X, eps=1e-12)
Xw = apply_whitener(X, mu, inv_std)
mdl = krr_fit(Xw, Y, lambda_reg=1e-3, length_scale=0.8, variance=1.0)
Y_hat = krr_predict(mdl, Xw)
SPD solves via Cholesky¶
For symmetric positive definite systems, use Cholesky with a safe fallback:
from transformer_instant import solve_spd
# Solve (X^T X + λI) W = X^T Y, or K + λI for kernels
W = solve_spd(A, B)
Under the hood, SciPy's cho_factor/cho_solve is used when available, otherwise NumPy cholesky with triangular solves (and then a general solve as a last resort).
Test tolerances¶
Use strict tolerances for equivalence tests:
- Linear/ridge, kernel ridge, deep linear, ELM: atol = 1e-12, rtol = 0.0
- 1D barycentric with Chebyshev–Lobatto nodes: atol = 1e-12
- 1D barycentric with uniform nodes (harder): use a looser atol ≈ 1e-5
Example:
import numpy as np
np.testing.assert_allclose(Y_pred, Y_true, rtol=0.0, atol=1e-12)
Summary of best practices¶
- Prefer float64, vectorized operations, and SPD Cholesky solves.
- Whiten features, especially for kernel and ELM paths.
- Use small ℓ2 regularization (≈ 1e-6 to 1e-3) by default.
- Keep asserts at ~1e-12 except where numerical geometry (e.g., uniform 1D nodes) requires relaxation.