StreamPress is a header-only C++ library (with Python and R bindings) for high-performance compressed sparse matrix I/O using the .spz file format.
๐ Documentation: Python ยท R ยท C++ headers
Standard sparse matrix serialization formats (.rds, .pickle, Matlab .mat) are general-purpose and inefficient for large biological datasets. StreamPress is designed specifically for large sparse floating-point matrices such as single-cell RNA-seq count data:
obs (cell) and var (gene) annotation tables to the fileCopy include/streampress/ into your project, or use as a Meson subproject:
# subprojects/streampress.wrap
[wrap-git]
url = https://github.com/zdebruine/streampress.git
revision = main
depth = 1
streampress_dep = dependency('streampress',
fallback: ['streampress', 'streampress_dep'])
No external dependencies โ StreamPress only requires C++17 and standard headers.
pip install streampress
Requires Python โฅ 3.9, numpy, scipy.
install.packages("streampress")
# Development version:
remotes::install_github("zdebruine/streampress", subdir = "r")
#include <streampress/streampress_api.hpp>
// Build a sparse matrix (CSC format)
streampress::CSCMatrix mat;
mat.m = 4; mat.n = 3; mat.nnz = 6;
mat.p = {0, 2, 3, 6};
mat.i = {0, 2, 1, 0, 2, 3};
mat.x = {1.0, 4.0, 3.0, 2.0, 5.0, 6.0};
// Write to file
streampress::api::WriteOptions opts;
streampress::api::write_sparse("matrix.spz", mat, {}, {}, opts);
// Read back
auto result = streampress::api::read_sparse("matrix.spz");
// result.m, result.n, result.nnz, result.col_ptr, result.row_ind, result.values
// Inspect metadata (no decompression)
auto info = streampress::api::info("matrix.spz");
// info.m, info.n, info.nnz, info.chunk_cols, info.has_transpose, ...
// Read a column range
auto cols = streampress::api::slice_cols("matrix.spz", 0, 2);
import scipy.sparse as sp
import streampress as stp
# Write a sparse matrix
A = sp.random(10000, 500, density=0.05, format="csc", dtype="float64")
stp.st_write(A, "matrix.spz")
# Read back
B = stp.st_read("matrix.spz")
# Inspect metadata (fast, no decompression)
info = stp.st_info("matrix.spz")
print(f"{info['nrow']}ร{info['ncol']}, ratio={info['density']:.4f}")
# Read a column subset
cols = stp.st_slice_cols("matrix.spz", list(range(0, 100)))
# Streaming chunk iteration
for chunk in stp.st_map_chunks("matrix.spz", lambda x: x):
process(chunk)
library(streampress)
library(Matrix)
A <- rsparsematrix(10000, 500, density = 0.05)
# Write
st_write(A, "matrix.spz")
# Read
B <- st_read("matrix.spz")
# Inspect (fast)
info <- st_info("matrix.spz")
# Partial read
B_sub <- st_read("matrix.spz", cols = 1:100)
For each column chunk, StreamPress applies:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ File Header (128 bytes) โ
โ โ dimensions, nnz, format version โ
โ โ chunk index (offset + size per chunk) โ
โ โ obs/var table offsets โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Chunk 0 โ Chunk 1 โ ... โ Chunk N-1 โ
โ (nnzโ values + indices, rANS-compressed) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Transpose Section (optional) โ
โ โ CSC(Aแต) stored independently โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ obs table (optional) โ var table (opt) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
namespace streampress::api)| Function | Description |
|---|---|
write_sparse(path, mat, obs, var, opts) |
Compress CSC matrix to .spz |
read_sparse(path, opts) |
Decompress .spz to ReadResult |
slice_cols(path, start, end, opts) |
Read column range [start, end) |
info(path) |
Read file metadata (no decompression) |
add_transpose(path) |
Add pre-computed transpose section |
streampress)| Function | Description |
|---|---|
st_write(mat, path, **kwargs) |
Write scipy.sparse.csc_matrix |
st_read(path, **kwargs) |
Read .spz โ csc_matrix |
st_info(path) |
Read metadata dict |
st_slice_cols(path, cols) |
Read column subset |
st_slice_rows(path, rows) |
Read row subset (needs transpose) |
st_slice(path, rows, cols) |
Read submatrix |
st_add_transpose(path) |
Add transpose section |
st_map_chunks(path, fn) |
Apply function to each chunk |
st_write_dense(X, path) |
Write dense numpy array |
st_read_dense(path) |
Read dense file |
streampress)| Function | Description |
|---|---|
st_write(x, path, ...) |
Write dgCMatrix to .spz |
st_read(path, ...) |
Read .spz โ dgCMatrix |
st_info(path) |
Read file metadata list |
st_write_dense(x, path) |
Write dense matrix |
st_read_dense(path) |
Read dense file |
st_read_obs(path) |
Read row metadata data.frame |
st_read_var(path) |
Read column metadata data.frame |
st_convert(input, output, ...) |
Convert to .spz with options |
streampress/
โโโ include/streampress/ โ Header-only C++ library
โ โโโ streampress_api.hpp โ Public API (write, read, slice, info)
โ โโโ sparse.hpp โ Sparse chunked format (primary)
โ โโโ dense.hpp โ Dense column-panel format
โ โโโ codec/ โ rANS, Golomb-Rice, VarInt, bitstream
โ โโโ core/ โ CSCMatrix, PRNG, platform I/O
โ โโโ format/ โ Binary header structures
โ โโโ model/ โ Compressor models
โ โโโ transform/ โ Delta encoding, value mapping
โโโ tests/ โ C++ unit tests (meson)
โโโ python/ โ PyPI package (scikit-build-core + nanobind)
โ โโโ src/streampress/
โโโ r/ โ CRAN package (Rcpp)
โโโ R/
โโโ src/
MIT โ see LICENSE.