Changelog
Source:NEWS.md
RcppML 1.0.0
New Features
Statistical Distributions via IRLS
NMF now supports six distribution-appropriate loss functions via Iteratively Reweighted Least Squares: - "mse" (Gaussian, default), "gp" (Generalized Poisson), "nb" (Negative Binomial), "gamma", "inverse_gaussian", "tweedie" - Automatic distribution selection via auto_nmf_distribution() - Zero-inflation models (zi = "row" or zi = "col") for ZINB and ZIGP
StreamPress I/O (.spz Format)
- Column-oriented binary format with 10–20× compression via rANS entropy coding
- Streaming NMF directly from
.spzfiles for datasets larger than RAM -
st_write(),st_read(),st_info()for reading and writing.spzfiles - Embedded obs/var metadata tables via
st_read_obs()andst_read_var() - GPU-direct reading with
st_read_gpu()for zero-copy GPU NMF
FactorNet Graph API
-
factor_net()composes multi-modal, deep, and branching NMF pipelines - Multi-modal NMF with shared embeddings across data modalities
- Deep NMF with hierarchical decomposition
- Conditional factorization by sample groups
- Cross-validation for graph architectures via
cross_validate_graph()
GPU Acceleration
- Optional CUDA backend via cuBLAS/cuSPARSE with automatic CPU fallback
-
gpu_available()andgpu_info()for device queries - Streaming GPU NMF for matrices exceeding VRAM
- Dense and sparse GPU NMF paths
Cross-Validation Improvements
- Speckled holdout masks for principled rank selection
-
mask_zerosparameter for recommendation data (only non-zero entries held out) - Automatic rank search with
k = "auto" - Multiple replicates via
cv_seedvector - Early stopping with configurable
patience
SVD and PCA
-
svd()for truncated SVD with five methods: deflation, Krylov, Lanczos, IRLBA, randomized -
pca()convenience wrapper (centered and optionally scaled SVD) - Constrained PCA: non-negative (
nonneg = TRUE), sparse (L1), and combined -
variance_explained()for scree plots -
predict()for SVD objects (out-of-sample projection)
Projective and Symmetric NMF
-
projective = TRUE: H is computed as instead of solved independently, producing more orthogonal factors -
symmetric = TRUE: For symmetric matrices, enforces
Additional Regularization
- L1 / L2 penalties with separate values for W and H
- L21 group sparsity for automatic factor selection
- Angular decorrelation penalty (replaces
ortho) - Upper bound constraints on factor values
- Graph Laplacian regularization for spatial/network smoothness
New Datasets
-
golub: Leukemia gene expression (38 × 5,000 sparse) -
olivetti: Olivetti face images (400 × 4,096 sparse) -
digits: Handwritten digit images (1,797 × 64 dense) -
pbmc3k: PBMC 3k scRNA-seq with cell type annotations (13,714 × 2,638, StreamPress compressed)
Other
- Multiple random initializations with best-model selection via
seed = c(...) - Automatic NA detection and masking
-
consensus_nmf()for robust factorizations across multiple random starts
Enhancements
- S4 class system with backward-compatible
$accessor (migrated from implicit S4) -
on_iterationcallback for custom per-iteration logging - Lanczos and IRLBA initialization for faster NMF convergence
Internal Changes
- Unified NMF configuration into
NMFConfigstruct - Pure C++ header library (Rcpp-free core algorithms in
inst/include/RcppML/) -
DataAccessor<MatrixType>template for unified sparse/dense dispatch - Consolidated header structure and eliminated code duplication
- OpenMP parallelization with proper reduction clauses