[r] Support graph list inputs in clustering by koefoeden · Pull Request #337 · bnprks/BPCells

koefoeden · 2026-05-06T08:34:46Z

Context

I ran into a large transient memory spike when clustering a shared-nearest-neighbor graph built from about 463k cells and 20 embedding dimensions. The KNN search completed, but memory climbed sharply during graph clustering. Smaller benchmarks pointed to the igraph::graph_from_adjacency_matrix() conversion inside the current clustering path as the expensive step, rather than KNN search, SNN construction, or the final membership vector.

Representative peak RSS measurements from the same graph construction/clustering path were:

Input / stage	Peak RSS
20k cells, KNN only	~0.59-0.66 GiB
20k cells, KNN + SNN	~0.68-0.70 GiB
20k cells, KNN + SNN + Leiden through sparse matrix	~5.98-6.00 GiB
30k cells, sparse-matrix clustering route	~12.58 GiB
30k cells, direct edge-list route	~0.87 GiB
50k cells, sparse-matrix clustering route	~42.9 GiB
50k cells, direct edge-list route	~1.04 GiB

A separate 20k-cell equivalence check produced identical Leiden memberships between the sparse matrix route and the direct edge-list route. This PR upstreams that lower-memory route for BPCells graph-list outputs while leaving the existing sparse matrix behavior unchanged.

Summary

Allow cluster_graph_leiden() and cluster_graph_louvain() to accept graph-list outputs from knn_to_snn_graph(return_type = "list") and knn_to_geodesic_graph(return_type = "list").
Build igraph objects directly from the edge list for list inputs, while keeping the existing sparse matrix path unchanged.
Convert BPCells 0-based edge-list indices to igraph 1-based vertex IDs internally.
Ignore self-loops for list inputs to match the existing diag = FALSE matrix behavior.
Add regression coverage that matrix and graph-list inputs produce identical Leiden and Louvain memberships on the same graph.

Testing

testthat::test_file("tests/testthat/test-clustering.R")

Result: 13 passed; 2 skipped because optional packages RcppAnnoy and RcppHNSW were not installed in the test environment.

The new regression assertions are in the existing igraph clustering doesn't crash test and did run. The two skipped tests are separate optional ANN tests guarded by skip_if_not_installed("RcppAnnoy") and skip_if_not_installed("RcppHNSW"); I did not install those optional packages for this focused test run.

Disclosure

GPT-5.5 was utilized to help draft this PR and implementation.

koefoeden · 2026-05-06T10:39:25Z

Changed to draft status due to possible regression detected. Will update!

koefoeden · 2026-05-06T11:09:32Z

No regression afterall. It was a high number of single-tons in my case that resulted in >100 "distinct" clusters. I implemented my own helper to discard clusters below a given number of barcodes.

[r] Support graph list inputs in clustering

bc19b92

koefoeden marked this pull request as draft May 6, 2026 08:37

koefoeden marked this pull request as ready for review May 6, 2026 08:59

koefoeden marked this pull request as draft May 6, 2026 10:36

koefoeden marked this pull request as ready for review May 6, 2026 11:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[r] Support graph list inputs in clustering#337

[r] Support graph list inputs in clustering#337
koefoeden wants to merge 1 commit into
bnprks:mainfrom
koefoeden:cluster-graph-list-input

koefoeden commented May 6, 2026 •

edited

Loading

Uh oh!

koefoeden commented May 6, 2026 •

edited

Loading

Uh oh!

koefoeden commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

koefoeden commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Summary

Testing

Disclosure

Uh oh!

koefoeden commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

koefoeden commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

koefoeden commented May 6, 2026 •

edited

Loading

koefoeden commented May 6, 2026 •

edited

Loading