Skip to content

Optimize ImmutableHashSet<T>.IsSubsetOf to avoid unnecessary allocations#127920

Draft
aw0lid wants to merge 1 commit intodotnet:mainfrom
aw0lid:fix-immutablehashset-IsSubsetOf-allocs
Draft

Optimize ImmutableHashSet<T>.IsSubsetOf to avoid unnecessary allocations#127920
aw0lid wants to merge 1 commit intodotnet:mainfrom
aw0lid:fix-immutablehashset-IsSubsetOf-allocs

Conversation

@aw0lid
Copy link
Copy Markdown
Contributor

@aw0lid aw0lid commented May 7, 2026

Part of #127279

Summary

ImmutableHashSet<T>.IsSubsetOf always creates a new intermediate HashSet<T> for the other collection, leading to avoidable allocations and GC pressure, especially for large datasets

Optimization Logic

  • O(1) Pre-Scan: Immediately returns false if other is an ICollection with a smaller count than the source.. By performing this validation upfront, the need for tracking variables like matches is eliminated, as any complete match is now mathematically guaranteed to be a subset.

  • Fast-Path Pattern Matching: Detects ImmutableHashSet<T> and HashSet<T> to bypass intermediate allocations.

  • Comparer Guard: Validates EqualityComparer compatibility before triggering fast paths to ensure logical consistency.

  • Short-Circuit Validation: Re-validates Count within specialized paths for an immediate exit before $O(n)$ enumeration.

  • Reverse-Lookup Strategy: An architectural shift where the ImmutableHashSet (The Source) iterates and queries the other collection if was Hashset. This leverages the O(1) lookup of the HashSet instead of the O(log N) lookup of the immutable tree.

  • Zero-Allocation Execution: Direct iteration over compatible collections, eliminating the costly new HashSet<T>(other) fallback.

  • Deferred fallback: Reserves the expensive allocation solely for general IEnumerable types.

Click to expand Benchmark Source Code
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Order;
using BenchmarkDotNet.Running;
using System;
using System.Collections.Generic;
using System.Collections.Immutable;
using System.Linq;

namespace ImmutableHashSetBenchmarks
{
    [InProcess]
    [MemoryDiagnoser]
    [Orderer(SummaryOrderPolicy.FastestToSlowest)]
    [RankColumn]
    public class ImmutableHashSetIsSubsetOfBenchmark
    {
        private ImmutableHashSet<int> _sourceSet = null!;
        private ImmutableHashSet<int> _immutableLarger = null!;
        private HashSet<int> _bclHashSetLarger = null!;
        private List<int> _listLarger = null!;
        private int[] _arrayLarger = null!;

        private ImmutableHashSet<int> _immutableSmaller = null!;
        private ImmutableHashSet<int> _immutableSameCount = null!;
        
        private HashSet<int> _bclHashSetLargerDiffComparer = null!;

        private List<int> _listWithDuplicatesButSubset = null!;
        private ImmutableHashSet<int> _emptySource = null!;
        private List<int> _listSameElementsWithDuplicates = null!;
        private List<int> _listSmaller = null!;

        [Params(100000)]
        public int Size { get; set; }

        [GlobalSetup]
        public void Setup()
        {
            var elements = Enumerable.Range(0, Size).ToList();
            var largerElements = Enumerable.Range(0, Size + 10).ToList();
            var smallerElements = Enumerable.Range(0, Size - 10).ToList();
            var reverseComparer = new ReverseComparer<int>();

            _sourceSet = ImmutableHashSet.CreateRange(elements);
            
            _immutableLarger = ImmutableHashSet.CreateRange(largerElements);
            _bclHashSetLarger = new HashSet<int>(largerElements);
            _listLarger = largerElements;
            _arrayLarger = largerElements.ToArray();

            _immutableSmaller = ImmutableHashSet.CreateRange(smallerElements);
            
            _immutableSameCount = ImmutableHashSet.CreateRange(elements);

            _bclHashSetLargerDiffComparer = new HashSet<int>(largerElements, reverseComparer);

            _listWithDuplicatesButSubset = elements.Concat(new[] { Size + 1 }).ToList();
            _emptySource = ImmutableHashSet<int>.Empty;
            
            _listSameElementsWithDuplicates = elements.Concat(elements).ToList();
            _listSmaller = Enumerable.Range(0, Size - 10).ToList();
        }

        #region Fast Path: Same Type and Comparer (Optimized - Zero Alloc)

        [Benchmark(Description = "ImmutableHashSet (Subset - O(N))")]
        public bool Case_ImmutableHashSet_Subset() => _sourceSet.IsSubsetOf(_immutableLarger);

        [Benchmark(Description = "BCL HashSet (Subset - O(N))")]
        public bool Case_BclHashSet_Subset() => _sourceSet.IsSubsetOf(_bclHashSetLarger);

        [Benchmark(Description = "Same Count")]
        public bool Case_SameCount_Subset() => _sourceSet.IsSubsetOf(_immutableSameCount);

        #endregion

        #region Early Exit: Count Check (O(1))

        [Benchmark(Description = "Empty Source (O(1) Check)")]
        public bool Case_EmptySource_Subset() => _emptySource.IsSubsetOf(_bclHashSetLarger);

        [Benchmark(Description = "Early Exit (Other is Smaller)")]
        public bool Case_SmallerCount() => _sourceSet.IsSubsetOf(_immutableSmaller);
        [Benchmark(Description = "Early Exit (List is Smaller - O(1))")]
        public bool Case_List_SmallerCount() => _sourceSet.IsSubsetOf(_listSmaller);

        #endregion

        #region Fallback Path: Non-Set or Different Comparer

        [Benchmark(Description = "List (Subset - Fallback to HashSet)")]
        public bool Case_List_Subset() => _sourceSet.IsSubsetOf(_listLarger);

        [Benchmark(Description = "Array (Subset - Fallback to HashSet)")]
        public bool Case_Array_Subset() => _sourceSet.IsSubsetOf(_arrayLarger);

        [Benchmark(Description = "HashSet (Diff Comparer - Force Fallback)")]
        public bool Case_HashSet_DiffComparer() => _sourceSet.IsSubsetOf(_bclHashSetLargerDiffComparer);

        [Benchmark(Description = "List with Duplicates (Is Subset)")]
        public bool Case_List_Duplicates_Subset() => _sourceSet.IsSubsetOf(_listWithDuplicatesButSubset);
        
        [Benchmark(Description = "List Same Elements with Duplicates")]
        public bool Case_List_SameElements_Duplicates() => _sourceSet.IsSubsetOf(_listSameElementsWithDuplicates);

        #endregion
    }

    public class ReverseComparer<T> : IEqualityComparer<T> where T : IComparable<T>
    {
        public bool Equals(T? x, T? y) => x?.CompareTo(y) == 0;
        public int GetHashCode(T? obj) => obj?.GetHashCode() ?? 0;
    }

    public class Program
    {
        public static void Main(string[] args)
        {
            BenchmarkRunner.Run<ImmutableHashSetIsSubsetOfBenchmark>();
        }
    }
}
Click to expand Benchmark Results

Benchmark Results (Before Optimization)

Method Size Mean Error StdDev Rank Gen0 Gen1 Gen2 Allocated
'Empty Source (O(1) Check)' 100000 12.01 ns 0.083 ns 0.070 ns 1 - - - -
'Array (Subset - Fallback to HashSet)' 100000 7,134,426.20 ns 23,812.618 ns 22,274.338 ns 2 70.3125 70.3125 70.3125 1738953 B
'BCL HashSet (Subset - O(N))' 100000 7,462,450.03 ns 25,737.985 ns 21,492.383 ns 3 70.3125 70.3125 70.3125 1738969 B
'List with Duplicates (Is Subset)' 100000 9,031,590.07 ns 35,776.352 ns 31,714.804 ns 4 62.5000 62.5000 62.5000 1738888 B
'HashSet (Diff Comparer - Force Fallback)' 100000 9,106,841.96 ns 51,923.162 ns 46,028.531 ns 4 62.5000 62.5000 62.5000 1739012 B
'List (Subset - Fallback to HashSet)' 100000 9,252,212.40 ns 53,277.854 ns 49,836.138 ns 4 62.5000 62.5000 62.5000 1738888 B
'Early Exit (List is Smaller - O(1))' 100000 9,709,447.54 ns 48,541.887 ns 40,534.674 ns 5 62.5000 62.5000 62.5000 1738890 B
'List Same Elements with Duplicates' 100000 11,652,925.09 ns 227,241.620 ns 243,145.987 ns 6 140.6250 140.6250 140.6250 3606729 B
'ImmutableHashSet (Subset - O(N))' 100000 14,247,152.34 ns 74,510.855 ns 66,051.932 ns 7 78.1250 78.1250 78.1250 1739192 B
'Same Count' 100000 14,250,459.71 ns 43,248.747 ns 38,338.888 ns 7 78.1250 78.1250 78.1250 1739296 B
'Early Exit (Other is Smaller)' 100000 14,593,751.20 ns 33,233.992 ns 27,751.888 ns 7 78.1250 78.1250 78.1250 1739296 B

Benchmark Results (After Optimization)

Method Size Mean Error StdDev Rank Gen0 Gen1 Gen2 Allocated
'Empty Source (O(1) Check)' 100000 8.289 ns 0.0390 ns 0.0346 ns 1 - - - -
'Early Exit (Other is Smaller)' 100000 8.639 ns 0.0890 ns 0.0743 ns 2 - - - -
'Early Exit (List is Smaller - O(1))' 100000 10.585 ns 0.0374 ns 0.0313 ns 3 - - - -
'BCL HashSet (Subset - O(N))' 100000 2,802,568.025 ns 40,421.1393 ns 35,832.2872 ns 4 - - - -
'List with Duplicates (Is Subset)' 100000 4,032,480.190 ns 36,203.9746 ns 32,093.8805 ns 5 62.5000 62.5000 62.5000 1738905 B
'List (Subset - Fallback to HashSet)' 100000 4,052,456.821 ns 19,595.8610 ns 16,363.4315 ns 5 62.5000 62.5000 62.5000 1738916 B
'Array (Subset - Fallback to HashSet)' 100000 4,301,986.384 ns 42,518.6955 ns 37,691.7161 ns 6 70.3125 70.3125 70.3125 1739034 B
'HashSet (Diff Comparer - Force Fallback)' 100000 4,858,005.052 ns 22,247.9385 ns 20,810.7355 ns 7 70.3125 70.3125 70.3125 1738917 B
'List Same Elements with Duplicates' 100000 4,930,698.772 ns 74,495.5126 ns 66,038.3316 ns 7 125.0000 125.0000 125.0000 3606502 B
'Same Count' 100000 12,737,250.208 ns 91,624.9344 ns 85,706.0205 ns 8 - - - -
'ImmutableHashSet (Subset - O(N))' 100000 12,976,106.070 ns 203,105.6376 ns 169,602.4062 ns 8 - - - -

Performance Analysis Summary (100,000 Elements)

Case / Method Before (ns) After (ns) Speedup Ratio Memory Improvement
Early Exit (Other is Smaller) 14,593,751 8.639 ~1,689,287x -100% (Zero Alloc)
Early Exit (List is Smaller) 9,709,447 10.585 ~917,283x -100% (Zero Alloc)
Empty Source (O(1) Check) 12.01 8.289 1.45x Stable (Zero Alloc)
BCL HashSet (Subset Path) 7,462,450 2,802,568 2.66x -100% (Zero Alloc)
ImmutableHashSet (Subset Path) 14,247,152 12,976,106 1.10x -100% (Zero Alloc)
HashSet (Diff Comparer - Fallback) 9,106,841 4,858,005 1.87x Stable (1.7 MB)
Array (Fallback to HashSet) 7,134,426 4,301,986 1.66x Stable (1.7 MB)
List (Fallback to HashSet) 9,252,212 4,052,456 2.28x Stable (1.7 MB)
List with Duplicates 9,031,590 4,032,480 2.24x Stable (1.7 MB)
Same Count 14,250,459 12,737,250 1.12x -100% (Zero Alloc)

Environment

BenchmarkDotNet v0.15.8

  • OS: Windows 11 (10.0.26200.8246/25H2)
  • CPU: Intel Core i5-6300U CPU 2.40GHz (Skylake), 1 CPU, 4 logical and 2 physical cores
  • Runtime: .NET 11.0.0 (11.0.0-preview.3.26170.106), X64 RyuJIT x86-64-v3

✅ Unit Tests Added

Added unit tests for IsSubsetOf to cover various edge cases and ensure the correctness of the new logic:

  • Mismatched Comparers: Validated behavior when comparing sets with different comparers (e.g., Ordinal vs. OrdinalIgnoreCase).
  • Duplicate Elements: Verified ICollection<T> logic to ensure that collections with duplicates are handled correctly by the early exit (Count < origin.Count).
  • Empty Set Scenarios: Confirmed expected behavior when either the origin or the target collection is empty.
  • Equality Logic: Covered cases where logical equality might differ from reference equality in comparers.

@dotnet-policy-service dotnet-policy-service Bot added the community-contribution Indicates that the PR has been added by a community member label May 7, 2026
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Collections community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant