Determine best rank for NMF using cross-validation — cross_validate

Find the rank that minimizes the mean squared error of test set reconstruction using cross-validation.

Usage

cross_validate_nmf(
  A,
  ranks,
  n_replicates = 3,
  tol = 1e-04,
  maxit = 100,
  verbose = 1,
  L1 = 0.01,
  L2 = 0,
  threads = 0,
  test_density = 0.05
)

# S3 method for cross_validate_nmf_data
plot(x, ...)

Arguments

A: sparse matrix (ideally variance-stabilized) of data for genes x cells (rows x columns)
ranks: a vector of ranks at which to fit a model and compute test set reconstruction error
n_replicates: number of random test sets
tol: tolerance of the fit (1e-5 for publication quality, 1e-3 for cross-validation)
maxit: maximum number of iterations
verbose: verbosity level
L1: L1/LASSO penalty to increase sparsity of model
L2: L2/Ridge penalty to increase angles between factors
threads: number of threads for parallelization across CPUs, 0 = use all available threads
test_density: fraction of values to include in the test set
x: the result of cross_validate_nmf
...: additional arguments (not implemented)

Value

a data.frame of test set reconstruction error vs. rank of class nmf_cross_validate_data. Use plot method to visualize or min to compute optimal rank.