Skip to content

Numerics and Stability

This page summarizes the numerical practices used throughout ExactStep to ensure stable, precise, and fast closed-form solves.

  • Float dtype: computations use float64 by default.
  • Whitening: whiten input features to reduce condition numbers.
  • Small L2: add tiny ℓ2 regularization to stabilize inverses when needed.
  • SPD solves: prefer Cholesky-based solvers with robust fallbacks.
  • Tight tolerances: tests assert near machine precision where appropriate.

Whitening

Whitening improves conditioning by zero-centering and scaling each feature to unit variance. Use an epsilon floor to avoid division by near-zero std.

import numpy as np
from transformer_instant import compute_whitener, apply_whitener

X = np.random.randn(256, 32)
mu, inv_std = compute_whitener(X, eps=1e-12)
Xw = apply_whitener(X, mu, inv_std)

Recommended epsilon: 1e-12.

Small ℓ2 regularization

Add a small λ to stabilize inverses without perceptible bias. Good defaults:

  • λ in [1e-6, 1e-3]
  • λ = 0 for exact interpolation in controlled tests

Examples:

from transformer_instant import ridge_fit_closed_form
W = ridge_fit_closed_form(H, Y, lambda_reg=1e-3)
from transformer_instant import krr_fit, krr_predict, compute_whitener, apply_whitener
mu, inv_std = compute_whitener(X, eps=1e-12)
Xw = apply_whitener(X, mu, inv_std)
mdl = krr_fit(Xw, Y, lambda_reg=1e-3, length_scale=0.8, variance=1.0)
Y_hat = krr_predict(mdl, Xw)

SPD solves via Cholesky

For symmetric positive definite systems, use Cholesky with a safe fallback:

from transformer_instant import solve_spd
# Solve (X^T X + λI) W = X^T Y, or K + λI for kernels
W = solve_spd(A, B)

Under the hood, SciPy's cho_factor/cho_solve is used when available, otherwise NumPy cholesky with triangular solves (and then a general solve as a last resort).

Test tolerances

Use strict tolerances for equivalence tests:

  • Linear/ridge, kernel ridge, deep linear, ELM: atol = 1e-12, rtol = 0.0
  • 1D barycentric with Chebyshev–Lobatto nodes: atol = 1e-12
  • 1D barycentric with uniform nodes (harder): use a looser atol ≈ 1e-5

Example:

import numpy as np
np.testing.assert_allclose(Y_pred, Y_true, rtol=0.0, atol=1e-12)

Summary of best practices

  • Prefer float64, vectorized operations, and SPD Cholesky solves.
  • Whiten features, especially for kernel and ELM paths.
  • Use small ℓ2 regularization (≈ 1e-6 to 1e-3) by default.
  • Keep asserts at ~1e-12 except where numerical geometry (e.g., uniform 1D nodes) requires relaxation.