Fix ConsensusCluster.fit_predict to use consensus matrix#29
Open
sharifhsn wants to merge 1 commit intosaeyslab:mainfrom
Open
Fix ConsensusCluster.fit_predict to use consensus matrix#29sharifhsn wants to merge 1 commit intosaeyslab:mainfrom
sharifhsn wants to merge 1 commit intosaeyslab:mainfrom
Conversation
fit_predict() was ignoring the consensus clustering entirely and running a single AgglomerativeClustering on the raw data. Now it calls fit() to build the consensus matrix (H resamplings), then clusters on the distance matrix (1 - consensus_matrix) with metric='precomputed'.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #25.
What does this fix?
ConsensusCluster.fit_predict()was ignoring the consensus matrix and running a singleAgglomerativeClusteringon the raw data:This bypasses
fit()entirely — the H=100 resamplings that build the consensus matrix never run. The method now callsfit()to build the consensus matrix, then clusters on1 - Mkwithmetric='precomputed':This matches the consensus clustering paper and the R FlowSOM implementation.
Notes
Performance:
fit_predictnow does real work (H=100 resamplings on 100 SOM codes), taking ~425ms instead of ~0.4ms. This is the intended behavior — the old version was silently skipping the consensus step.self.clusterparameter: The final clustering on the consensus distance matrix hardcodesAgglomerativeClusteringwithmetric="precomputed"rather than using the pluggableself.clusterclass. This is intentional —metric="precomputed"requires an algorithm that supports it, and a user-supplied clustering class may not. The user'sclusterclass is still used for the H resamplings insidefit().linkage="ward"incompatibility:AgglomerativeClusteringwithmetric="precomputed"does not supportlinkage="ward". The default is"average"which works fine. A user who explicitly passeslinkage="ward"would get aValueErrorfrom sklearn — this is an existing limitation of clustering on a precomputed distance matrix, not something introduced by this fix.One file, +6/-4 lines. All 38 tests pass.
🤖 Generated with Claude Code