[r] Support graph list inputs in clustering#337
Open
koefoeden wants to merge 1 commit into
Open
Conversation
Contributor
Author
|
Changed to draft status due to possible regression detected. Will update! |
Contributor
Author
|
No regression afterall. It was a high number of single-tons in my case that resulted in >100 "distinct" clusters. I implemented my own helper to discard clusters below a given number of barcodes. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
I ran into a large transient memory spike when clustering a shared-nearest-neighbor graph built from about 463k cells and 20 embedding dimensions. The KNN search completed, but memory climbed sharply during graph clustering. Smaller benchmarks pointed to the
igraph::graph_from_adjacency_matrix()conversion inside the current clustering path as the expensive step, rather than KNN search, SNN construction, or the final membership vector.Representative peak RSS measurements from the same graph construction/clustering path were:
A separate 20k-cell equivalence check produced identical Leiden memberships between the sparse matrix route and the direct edge-list route. This PR upstreams that lower-memory route for BPCells graph-list outputs while leaving the existing sparse matrix behavior unchanged.
Summary
cluster_graph_leiden()andcluster_graph_louvain()to accept graph-list outputs fromknn_to_snn_graph(return_type = "list")andknn_to_geodesic_graph(return_type = "list").diag = FALSEmatrix behavior.Testing
Result: 13 passed; 2 skipped because optional packages
RcppAnnoyandRcppHNSWwere not installed in the test environment.The new regression assertions are in the existing
igraph clustering doesn't crashtest and did run. The two skipped tests are separate optional ANN tests guarded byskip_if_not_installed("RcppAnnoy")andskip_if_not_installed("RcppHNSW"); I did not install those optional packages for this focused test run.Disclosure
GPT-5.5 was utilized to help draft this PR and implementation.