Fix ConsensusCluster.fit_predict to use consensus matrix by sharifhsn · Pull Request #29 · saeyslab/FlowSOM_Python

sharifhsn · 2026-03-22T21:33:27Z

Fixes #25.

What does this fix?

ConsensusCluster.fit_predict() was ignoring the consensus matrix and running a single AgglomerativeClustering on the raw data:

def fit_predict(self, data):
    if self.z_score:
        data = self._z_score(data)
    return self.cluster(n_clusters=self.n_clusters, linkage=self.linkage).fit_predict(data)

This bypasses fit() entirely — the H=100 resamplings that build the consensus matrix never run. The method now calls fit() to build the consensus matrix, then clusters on 1 - Mk with metric='precomputed':

def fit_predict(self, data):
    self.fit(data)
    distance_matrix = 1 - self.Mk
    return AgglomerativeClustering(
        n_clusters=self.n_clusters, metric="precomputed", linkage=self.linkage
    ).fit_predict(distance_matrix)

This matches the consensus clustering paper and the R FlowSOM implementation.

Notes

Performance: fit_predict now does real work (H=100 resamplings on 100 SOM codes), taking ~425ms instead of ~0.4ms. This is the intended behavior — the old version was silently skipping the consensus step.

self.cluster parameter: The final clustering on the consensus distance matrix hardcodes AgglomerativeClustering with metric="precomputed" rather than using the pluggable self.cluster class. This is intentional — metric="precomputed" requires an algorithm that supports it, and a user-supplied clustering class may not. The user's cluster class is still used for the H resamplings inside fit().

linkage="ward" incompatibility: AgglomerativeClustering with metric="precomputed" does not support linkage="ward". The default is "average" which works fine. A user who explicitly passes linkage="ward" would get a ValueError from sklearn — this is an existing limitation of clustering on a precomputed distance matrix, not something introduced by this fix.

One file, +6/-4 lines. All 38 tests pass.

🤖 Generated with Claude Code

fit_predict() was ignoring the consensus clustering entirely and running a single AgglomerativeClustering on the raw data. Now it calls fit() to build the consensus matrix (H resamplings), then clusters on the distance matrix (1 - consensus_matrix) with metric='precomputed'.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ConsensusCluster.fit_predict to use consensus matrix#29

Fix ConsensusCluster.fit_predict to use consensus matrix#29
sharifhsn wants to merge 1 commit intosaeyslab:mainfrom
sharifhsn:fix/consensus-cluster

sharifhsn commented Mar 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sharifhsn commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this fix?

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sharifhsn commented Mar 22, 2026 •

edited

Loading