RcppML is an R package for high-performance Non-negative Matrix Factorization (NMF), truncated SVD/PCA, and divisive clustering of large sparse and dense matrices. It supports six statistical distributions via IRLS, cross-validation for automatic rank selection, optional CUDA GPU acceleration, out-of-core streaming for datasets larger than memory, and a composable graph DSL for multi-modal and deep NMF.
Overview
RcppML decomposes a matrix A into lower-rank non-negative factors:
A ≈ W · diag(d) · H
where W (features × k) and H (k × samples) have columns/rows scaled to unit sum via a diagonal d, ensuring interpretable, scale-invariant factors.
Key capabilities:
- Fast NMF with alternating least squares, coordinate descent or Cholesky NNLS, and OpenMP parallelism via the Eigen C++ library
- Six statistical distributions — Gaussian (MSE), Generalized Poisson, Negative Binomial, Gamma, Inverse Gaussian, Tweedie — fitted via Iteratively Reweighted Least Squares (IRLS)
- Cross-validation with speckled holdout masks for principled rank selection
- GPU acceleration via CUDA (cuBLAS/cuSPARSE) with automatic fallback to CPU
-
Streaming NMF from StreamPress (
.spz) files for datasets that exceed available memory - FactorNet graph API for composable multi-modal, deep, and branching NMF pipelines
- Truncated SVD with five methods: deflation, Krylov, Lanczos, IRLBA, randomized
- Divisive clustering via recursive rank-2 factorizations
Installation
# Stable release from CRAN
install.packages("RcppML")
# Development version from GitHub (requires Rcpp and RcppEigen)
devtools::install_github("zdebruine/RcppML")GPU support (optional): Requires CUDA Toolkit ≥ 11.0 installed on the system. See vignette("gpu-acceleration") for build instructions.
Quick Start
library(RcppML)
# Load a built-in gene expression dataset
data(aml) # 824 genomic regions × 135 samples (dense matrix)
# Run NMF with rank k = 6
model <- nmf(aml, k = 6, seed = 42)
model
#> nmf model
#> k = 6
#> w: 824 x 6
#> d: 6
#> h: 6 x 135
# Inspect factor loadings
head(model@w)
model@d
model@h[, 1:5]
# Reconstruction error
evaluate(model, aml)
# Project onto new data
H_new <- predict(model, aml[, 1:50])
# Plot the model summary
plot(model)Statistical Distributions
RcppML goes beyond standard Frobenius-norm NMF by supporting distribution-appropriate loss functions via IRLS. This is critical for count data (scRNA-seq, text mining) where Gaussian assumptions are wrong.
loss= |
Distribution | Variance V(μ) | Use Case |
|---|---|---|---|
"mse" |
Gaussian | constant | Dense/general data (default) |
"gp" |
Generalized Poisson | μ(1+θ)² | Overdispersed counts |
"nb" |
Negative Binomial | μ + μ²/r | scRNA-seq, quadratic variance-mean |
"gamma" |
Gamma | μ² | Positive continuous data |
"inverse_gaussian" |
Inverse Gaussian | μ³ | Heavy right-tailed positive |
"tweedie" |
Tweedie | μ^p | Flexible power-law variance |
Example: Negative Binomial NMF for scRNA-seq
data(aml)
# NB is the natural choice for count data
model_nb <- nmf(aml, k = 6, loss = "nb")
# Fine-grained control via ... parameters
model_nb <- nmf(aml, k = 6,
loss = "nb",
nb_size_init = 10, # Initial size parameter r
nb_size_max = 1e6) # Upper bound for r
# Automatic distribution selection via BIC
auto <- auto_nmf_distribution(aml, k = 6)
auto$best # best distribution name
auto$comparison # BIC/AIC table for each distributionZero-Inflation Models
For data with excess zeros (e.g., droplet-based scRNA-seq), zero-inflated GP and NB models estimate structural dropout probabilities separately from the count model:
# Zero-inflated Negative Binomial with per-row dropout
model_zinb <- nmf(aml, k = 6, loss = "nb", zi = "row")
# Per-column dropout estimation
model <- nmf(aml, k = 6, loss = "nb", zi = "col")Cross-Validation
Cross-validation uses a speckled holdout mask — a random subset of matrix entries are held out during training, and test error is computed on these entries. This enables principled rank selection.
Regularization
RcppML supports a comprehensive suite of regularization methods that can be combined freely:
L21 (Group Sparsity) — Factor Selection
L21-norm regularization drives entire rows of W or columns of H toward zero, effectively performing automatic factor selection:
Graph Laplacian — Spatial/Network Smoothness
If features or samples have a known graph structure (e.g., gene regulatory network, spatial coordinates), graph regularization encourages connected nodes to have similar factor representations:
GPU Acceleration
RcppML optionally uses CUDA for GPU-accelerated NMF, delivering 10–20× speedups on large matrices.
# Check GPU availability
gpu_available()
#> [1] TRUE
gpu_info()
#> $device_name: "NVIDIA A100"
#> $total_memory_mb: 40960
# Sparse NMF on GPU
model <- nmf(data, k = 20, resource = "gpu")
# Dense NMF on GPU
model <- nmf(as.matrix(data), k = 20, resource = "gpu")
# Auto-dispatch (default): uses GPU if available, falls back to CPU
model <- nmf(data, k = 20, resource = "auto")The GPU backend supports: - Standard NMF (sparse and dense) - Cross-validation NMF - MSE loss (GPU-native), all other losses via automatic CPU fallback - OpenMP + CUDA hybrid execution - Zero-copy NMF with st_read_gpu() for pre-loaded GPU data
Streaming Large Data (StreamPress .spz Files)
For datasets that exceed available RAM, RcppML streams data from StreamPress (.spz) compressed files — a column-oriented binary format with 10–20× compression via rANS entropy coding.
Writing and Reading SPZ Files
library(RcppML)
data(pbmc3k)
# Write sparse matrix to .spz file
spz_path <- tempfile(fileext = ".spz")
st_write(pbmc3k, spz_path, include_transpose = TRUE)
# Read back into memory
mat <- st_read(spz_path)
identical(dim(mat), dim(pbmc3k)) # TRUE
# File size comparison
file.size(spz_path) # Much smaller than RDS/RDAStreaming NMF
# NMF directly from .spz file — data never fully loaded into memory
model <- nmf(spz_path, k = 10)
# Streaming cross-validation
cv <- nmf(spz_path, k = 2:15, test_fraction = 0.1)
# Streaming with non-MSE distributions
model <- nmf(spz_path, k = 10, loss = "nb")Streaming NMF processes data in column panels with double-buffered asynchronous I/O, maintaining O(m·k + n·k + chunk) memory regardless of total matrix size.
FactorNet Graph API
The factor_net() API composes complex factorization pipelines from simple building blocks. Multi-modal, deep, and branching NMF networks are expressed as directed graphs:
Multi-Modal NMF
Share a common embedding H across two data matrices (e.g., RNA + ATAC from the same cells):
# Define inputs
inp_rna <- factor_input(rna_matrix, "rna")
inp_atac <- factor_input(atac_matrix, "atac")
# Create shared input node
shared <- factor_shared(inp_rna, inp_atac)
# Build NMF layer with per-factor regularization
layer <- shared |> nmf_layer(k = 10, name = "shared",
W = W(L1 = 0.1),
H = H(L1 = 0.05))
# Compile and fit the network
net <- factor_net(
inputs = list(inp_rna, inp_atac),
output = layer,
config = factor_config(maxit = 100, seed = 42))
result <- fit(net)
# Access results (layer name = "shared")
result$shared$W$rna # RNA feature loadings
result$shared$W$atac # ATAC feature loadings
result$shared$H # Shared cell embedding (k × n_cells)Deep NMF
Stack layers for hierarchical decomposition:
inp <- factor_input(data, "X")
# Encoder layer: reduce to 20 factors
enc <- inp |> nmf_layer(k = 20, name = "encoder")
# Bottleneck: compress to 5 factors
bot <- enc |> nmf_layer(k = 5, name = "bottleneck")
net <- factor_net(inputs = list(inp), output = bot,
config = factor_config(maxit = 100, seed = 42))
result <- fit(net)Conditional Factorization
Append conditioning metadata to a layer’s H before passing downstream:
inp <- factor_input(data, "X")
layer1 <- inp |> nmf_layer(k = 5)
# Z is a conditioning matrix (n × p)
Z <- matrix(rep(c(1, 0, 0, 1), c(50, 50, 50, 50)), nrow = 100, ncol = 2)
conditioned <- factor_condition(layer1, Z)
layer2 <- conditioned |> nmf_layer(k = 10, name = "output")
net <- factor_net(inputs = list(inp), output = layer2,
config = factor_config(seed = 42))
result <- fit(net)Truncated SVD / PCA
RcppML provides five SVD algorithms with optional constraints and cross-validation:
# Standard truncated SVD
result <- svd(data, k = 10)
# PCA (centered SVD)
result <- pca(data, k = 10)
# Non-negative PCA
result <- pca(data, k = 10, nonneg = TRUE)
# Sparse PCA with L1 penalty
result <- pca(data, k = 10, L1 = c(0, 0.1))
# Auto-rank selection
result <- svd(data, k = "auto")
# Method selection
result <- svd(data, k = 10, method = "lanczos") # Fast unconstrained
result <- svd(data, k = 10, method = "krylov") # Block method, all constraints
result <- svd(data, k = 10, method = "randomized") # Approximate, very fast| Method | Constraints | Speed | Best For |
|---|---|---|---|
deflation |
All | Moderate | Mixed constraints, small k |
krylov |
All | Fast | Large k with constraints |
lanczos |
None | Very fast | Unconstrained SVD |
irlba |
None | Fast | General unconstrained |
randomized |
None | Very fast | Approximate large-scale |
Divisive Clustering
RcppML implements spectral clustering via recursive rank-2 NMF:
# Single bipartition
bp <- bipartition(data)
bp$samples # Cluster assignments (0/1)
# Recursive divisive clustering
clusters <- dclust(data, min_samples = 50, min_dist = 0.05)
clusters$id # Cluster labels
# Consensus clustering (multiple runs for stability)
cons <- consensus_nmf(data, k = 5, n_runs = 20)
plot(cons)Non-Negative Least Squares (NNLS)
Project factor matrices onto new data:
# Given W from NMF, solve for H on new data
H_new <- nnls(w = model@w, A = new_data)
# Solve for W given H (transpose projection)
W_new <- nnls(h = model@h, A = new_data)
# Unconstrained LS (semi-NMF projection)
H_ls <- nnls(w = model@w, A = new_data, nonneg = c(TRUE, FALSE))
# With regularization
H_sparse <- nnls(w = model@w, A = new_data, L1 = c(0, 0.1))Semi-NMF
Allow negative values in W (unconstrained) while keeping H non-negative:
model <- nmf(data, k = 10, nonneg = c(FALSE, TRUE))
any(model@w < 0) # TRUE — W can have negative entries
all(model@h >= 0) # TRUE — H remains non-negativeKey Parameters Reference
| Parameter | Type | Default | Description |
|---|---|---|---|
k |
int/vector/“auto” | — | Rank (vector for CV sweep, “auto” for search) |
loss |
string | "mse" |
Loss function: mse, gp, nb, gamma, inverse_gaussian, tweedie |
L1 |
numeric(2) | c(0,0) |
LASSO penalty [W, H] |
L2 |
numeric(2) | c(0,0) |
Ridge penalty [W, H] |
L21 |
numeric(2) | c(0,0) |
Group sparsity [W, H] |
angular |
numeric(2) | c(0,0) |
Decorrelation penalty [W, H] |
nonneg |
logical(2) | c(TRUE,TRUE) |
Non-negativity [W, H] |
upper_bound |
numeric(2) | c(0,0) |
Box constraint [W, H] (0 = none) |
zi |
string | "none" |
Zero-inflation: none, row, col |
robust |
logical/numeric | FALSE |
Huber robustness (TRUE = δ=1.345, numeric = custom δ) |
solver |
string | "auto" |
NNLS solver: auto, cd, cholesky |
seed |
int/matrix/string | NULL |
Initialization: NULL (random), integer (reproducible), “lanczos”, “irlba” |
mask |
various | NULL |
Missing data: NULL, “zeros”, “NA”, or a mask matrix |
resource |
string | "auto" |
Compute: auto, cpu, gpu |
test_fraction |
numeric | 0 |
CV holdout fraction (0 = no CV) |
tol |
numeric | 1e-4 |
Convergence tolerance |
maxit |
int | 100 |
Maximum iterations |
threads |
int | 0 |
OpenMP threads (0 = all) |
verbose |
logical | FALSE |
Print progress |
Built-in Datasets
| Dataset | Description | Dimensions | Format |
|---|---|---|---|
pbmc3k |
PBMC single-cell RNA-seq (10x Genomics) | 13,714 × 2,638 | SPZ raw bytes |
aml |
AML leukemia ATAC-seq | 824 × 135 | Dense matrix |
golub |
Golub leukemia microarray | 38 × 5,000 | Sparse dgCMatrix |
movielens |
MovieLens ratings | 3,867 × 610 | Sparse dgCMatrix |
hawaiibirds |
Hawaii bird survey counts | 183 × 1,183 | Sparse dgCMatrix |
olivetti |
Olivetti face images | 400 × 4,096 | Sparse dgCMatrix |
digits |
Handwritten digits (MNIST subset) | 1,797 × 64 | Dense matrix |
Performance Tips
Sparse input: Use
dgCMatrixformat (from theMatrixpackage) for sparse data — RcppML auto-detects and uses optimized sparse routines.Solver selection: For large k (> 32),
solver = "cholesky"can be faster than coordinate descent. Usesolver = "auto"(default) for automatic selection.Initialization:
seed = "lanczos"provides better starting points and can reduce iteration count by 30–50%, but adds upfront SVD cost.Threads: RcppML uses OpenMP. Set
options(RcppML.threads = 4)to control parallelism, orthreads = 0for all available cores.GPU: For matrices with > 10K rows and k > 8, GPU acceleration provides significant speedup. Use
resource = "gpu".Streaming: For datasets larger than available RAM, write to
.spzformat withst_write()and factorize directly from the file path.
Contributing
See CONTRIBUTING.md for guidelines on reporting bugs, requesting features, and submitting pull requests.
Citation
citation("RcppML")DeBruine ZJ, Melcher K, Triche TJ (2021). "High-performance non-negative
matrix factorization for large single-cell data." BioRXiv.
doi:10.1101/2021.09.01.458620.
