Given a baseline MSE-fitted NMF model and the original data, computes score-test statistics for the power-variance family (\(V(\mu) = \mu^p\)) to determine the best-fitting distribution without refitting. Optionally tests for NB overdispersion.
Usage
score_test_distribution(
data,
model,
powers = c(0, 1, 2, 3),
test_nb = TRUE,
min_mu = 1e-06
)Arguments
- data
Original data matrix (sparse or dense)
- model
A fitted NMF object (from
nmf()withloss="mse")- powers
Numeric vector of variance powers to test. Default
c(0, 1, 2, 3)covers Gaussian, Poisson, Gamma, Inverse Gaussian.- test_nb
Logical; if
TRUEand data is integer-valued, also test the NB overdispersion diagnostic.- min_mu
Floor for predicted values to avoid division by zero. Default 1e-6.
Value
A list with:
- scores
Data frame with columns
power,T_stat,abs_T,distribution(label)- best_power
Numeric: the power p with smallest
|T_p|- best_distribution
Character: name of the best-matching distribution
- nb_diagnostic
If
test_nb=TRUE: list withT_NBandoverdispersed(logical)
Details
The score test statistic for variance power \(p\) is: $$T_p = \text{mean}\left(\frac{r_{ij}^2}{\mu_{ij}^p} - 1\right)$$ where \(r_{ij} = x_{ij} - \mu_{ij}\) are residuals and \(\mu_{ij} = (WH)_{ij}\) are predicted values.
Under the correct model, \(E[T_p] = 0\). The power minimizing \(|T_p|\) best matches the observed variance-mean relationship.
The NB diagnostic tests for quadratic overdispersion: $$T_{NB} = \text{mean}\left(\frac{r_{ij}^2 - \mu_{ij}}{\mu_{ij}^2}\right)$$ If \(T_{NB} > 0.1\), there is substantial overdispersion beyond Poisson, suggesting NB may be preferable to GP.
Examples
# \donttest{
A <- abs(rsparsematrix(200, 100, 0.3))
model <- nmf(A, k = 5, loss = "mse")
diag <- score_test_distribution(A, model)
print(diag$scores)
#> power T_stat abs_T distribution
#> 1 0 -4.365022e-01 4.365022e-01 gaussian
#> 2 1 5.237193e+02 5.237193e+02 gp
#> 3 2 5.222982e+08 5.222982e+08 gamma
#> 4 3 5.222981e+14 5.222981e+14 inverse_gaussian
cat("Best distribution:", diag$best_distribution, "\n")
#> Best distribution: gaussian
# }