CosmoStat is a software package for cosmo-statistics developed by the LCS group at CEA Paris-Saclay. My contribution is pycs/sparsity/sparse3d/ — a self-contained module for wavelet-based denoising of 3D IFU spectral cubes (velocity × y × x). It implements two iterative sparse-recovery algorithms on top of a hybrid 2D-1D undecimated wavelet transform, targeting low-SNR submillimetre and radio interferometric data such as ALMA [CII] cubes.
The 2D-1D wavelet decomposition
Standard 3D wavelets treat the spectral axis the same as spatial axes, which destroys spectral line profiles by mixing spatial morphology with kinematic structure. The 2D-1D approach separates the two:
- 2D spatial axis — undecimated isotropic starlet transform (à trous algorithm). Each scale \(j\) captures emission at a characteristic angular size \(\sim 2^j\) pixels while keeping the cube at full resolution (no downsampling). The starlet is isotropic and shift-invariant, well-matched to compact or extended emission without preferred direction.
- 1D spectral axis — undecimated filterbank along the frequency/velocity axis. Each scale captures emission across a characteristic line width, from narrow features at fine scales to broad wings at coarser scales.
- Cross product — the full decomposition produces \(N_{2D} \times N_{1D}\) detail sub-bands (one per spatial-scale × spectral-scale pair) plus a single coarse approximation band, all at the original cube size. The coarse band carries the large-scale mean structure and is always passed through unthresholded.
Because neither transform decimates, the frame is overcomplete: the total number of coefficients is \(N_{2D} \times N_{1D}\) times the cube size. This redundancy means the analysis operator \(A\) and its adjoint \(A^*\) satisfy \(A A^* \neq I\) — the frame operator is a blurring in coefficient space, not the identity. Iterative algorithms must account for this.
The backend is the pysparse MR2D1D C extension wrapped in Wavelet2D1DTransform:
PYTHON
from pycs.sparsity.sparse3d.wavelet2d1d_transform import Wavelet2D1DTransform
wt = Wavelet2D1DTransform()
# inds: nested list of (start, end) slices per sub-band
# w: flat 1-D coefficient array, length ≈ n2d * n1d * cube.size
inds, shapes, w = wt.decompose(cube, num_scales_2d=5, num_scales_1d=5)
# access a specific detail band
scale2d, scale1d = 2, 1
start, end = inds[scale2d][scale1d]
band_coeffs = w[start:end] # shape: (nz, ny, nx) flattened
# reconstruct from modified coefficients
cube_reconstructed = wt.reconstruct(w)
A critical implementation detail: pysparse keeps the coarse band in an internal C-level cache after each decompose call. If you call decompose again before reconstruct, the cache is overwritten. All algorithms therefore copy the coefficient buffer immediately after decomposition and restore the coarse band slice explicitly before every reconstruction call.
Per-band noise estimation
Noise characteristics differ across sub-bands because the starlet and spectral filters have different frequency responses. The module estimates noise separately for each of the \(N_{2D} \times N_{1D} - 1\) detail bands (the coarse band is excluded):
- With a noise cube: an independent noise realization (e.g. from a signal-free region of the observation) is transformed; the standard deviation of each band gives an unbiased per-band noise level \(\hat{\sigma}_{j,k}\).
- Without a noise cube: the median absolute deviation (MAD) of each band's coefficients is used: \(\hat{\sigma} = 1.48 \times \mathrm{median}(|c - \mathrm{median}(c)|)\). The factor 1.48 makes MAD consistent with the Gaussian standard deviation. MAD is robust because most coefficients in a sparse signal are noise-dominated.
The finest 2D scale (scale2d == 0) is the noisiest — its starlet filter passes the highest spatial frequencies, which carry the largest noise fraction. Both algorithms apply an additional threshold increment \(\Delta\) at this scale only: the effective detection threshold becomes \((\lambda + \Delta)\,\hat{\sigma}_{0,k}\) instead of \(\lambda\,\hat{\sigma}_{0,k}\). This prevents spurious noise peaks at the finest scale from being mistaken for real signal.
Denoiser2D1D — two algorithms
The Denoiser2D1D class exposes both algorithms through a unified interface. Algorithm selection is made at construction time via threshold_type.
Iterative Hard Thresholding (IHT) — threshold_type='hard'
Hard thresholding keeps a coefficient if \(|c| > \lambda\,\hat{\sigma}\) and zeros it otherwise — no shrinkage, no bias from the threshold operation itself. The algorithm runs in two phases:
- Phase 1 — single threshold pass: decompose the data once, apply hard thresholds per band (with the finest-scale increment), reconstruct the initial model \(\hat{x}_0\). This establishes the signal support \(S\) — the set of coefficient positions where signal was detected.
- Phase 2 — iterative L0 debias: the Eddington bias from thresholding means the model slightly overestimates flux (noise that happens to push a coefficient above threshold is kept). The debias corrects this: at each iteration, decompose the residual \(r = y - \hat{x}\), apply the hard threshold restricted to support \(S\), reconstruct the delta, and add it to the model. This peels off the overcomplete frame correction \((I - AA^*)^n\) iteratively.
PYTHON
from pycs.sparsity.sparse3d import Denoiser2D1D
denoiser = Denoiser2D1D(threshold_type='hard', verbose=True)
result = denoiser.denoise(
x=cube, # noisy data cube (nz, ny, nx)
y=cube, # reference cube (pass x for real data)
method='iterative',
threshold_level=5, # λ: detection threshold in σ
threshold_increment_high_freq=2, # Δ: extra threshold at finest 2D scale
num_scales_2d=5,
num_scales_1d=5,
noise_cube=None, # use MAD if no noise cube available
positivity=False, # clip to ≥ 0 after each iteration
num_iter=20,
patience=3,
)
model, deltas, residual_stds, best_iter, noise_levels = result
Iterative Soft Thresholding (IST) — threshold_type='soft'
Soft thresholding shrinks each coefficient toward zero by the threshold amount: \(\mathrm{sign}(c)\,\max(|c| - \lambda\hat{\sigma},\,0)\). This introduces a systematic flux bias (every detected coefficient is shrunk), which the reweighting and debias phases correct.
- Phase 1 — reweighted IST (IRLS): at each iteration, decompose the current model to compute adaptive per-coefficient weights \(w_{j} = \lambda\hat{\sigma}_{\mathrm{model}} / (|c_{\mathrm{model}}| + \varepsilon)\). For bright, well-detected coefficients \(|c| \gg \lambda\hat{\sigma}\), the weight approaches zero — effectively removing the shrinkage bias for signal. For noise-level coefficients the weight stays near 1, maintaining suppression. The effective soft threshold becomes \(w_j \cdot \lambda\hat{\sigma}\), which drives toward a hard-threshold result asymptotically.
- Phase 2 — L1 debias: after reweighting converges, residual flux lost to soft-shrinkage is recovered by a weighted soft-threshold applied to the residual \(r = y - \hat{x}\) within the detected support, iterating to convergence.
PYTHON
denoiser = Denoiser2D1D(threshold_type='soft', verbose=True)
result = denoiser.denoise(
x=cube,
y=cube,
method='iterative',
threshold_level=5,
threshold_increment_high_freq=2,
num_scales_2d=5,
num_scales_1d=5,
positivity=False,
num_iter_reweight=20, # Phase 1: reweighting iterations
num_iter_debias=20, # Phase 2: debias iterations
patience=3,
)
model, model_1step, model_no_rw, deltas, stds_rw, stds_db, best_iter, dists, noise_levels = result
Convergence
Both algorithms use a plateau-based convergence test with a decreasing-patience outer loop. At each debias/reweight iteration, convergence is declared if the relative change in residual standard deviation satisfies \(|\sigma_t - \sigma_{t-1}|/\sigma_{t-1} \leq \varepsilon\) for patience consecutive iterations. If convergence is not reached, patience is decremented and the loop retries from scratch. The final model is always taken from the iteration with the lowest residual STD across all trials.
Positivity constraint
For pure-emission sources (no absorption features, no mean subtraction), negative flux is unphysical. Setting positivity=True clips the reconstructed model to non-negative values after each iteration. This is applied in real space, not in the wavelet domain — highly negative wavelet coefficients can legitimately correspond to signal structure (e.g. the negative sidelobe of a starlet basis function around a compact source), so wavelet-domain clipping would remove real signal.
Simple (single-pass) denoising
For high-SNR data or quick inspection, method='simple' runs only Phase 1 of IHT — a single hard threshold pass with no iteration:
PYTHON
denoiser = Denoiser2D1D(threshold_type='hard', verbose=False)
model, noise_levels = denoiser.denoise(
x=cube, y=cube,
method='simple',
threshold_level=4,
threshold_increment_high_freq=2,
num_scales_2d=5, num_scales_1d=5,
)
Authors
- Arnab Lahiry —
pycs/sparsity/sparse3d/ (sparse3d submodule, Denoiser2D1D, Wavelet2D1DTransform wrapper)
- CosmoStat / LCS group, CEA Paris-Saclay — core library