Skip to content

py/map: Make dicts preserve insertion order.#34

Draft
andrewleech wants to merge 3 commits intoreview/py-map-orderedfrom
py-map-ordered
Draft

py/map: Make dicts preserve insertion order.#34
andrewleech wants to merge 3 commits intoreview/py-map-orderedfrom
py-map-ordered

Conversation

@andrewleech
Copy link
Copy Markdown
Owner

@andrewleech andrewleech commented Mar 26, 2026

Summary

Some years back @dpgeorge and I had a number of discussions about making dicts ordered (micropython#6170, micropython#6173), with a couple of my earlier attempts (micropython#5514, micropython#5517) being too simple to land. That led to @dpgeorge's micropython#6173 which took the proper CPython/PyPy approach (dense key/value array with separate sparse hash index) but was left as WIP with a few open items. With an aim to close out this earlier work I've tried to build on that PR with fixes for the remaining issues, compaction, OrderedDict simplification, performance testing and flash size reduction (as much as I can).

Dicts now preserve insertion order matching CPython 3.7+. Hash indices are uint8 for small dicts (<256 entries), uint16 above that, and uint32 via MICROPY_PY_MAP_LARGE for dicts exceeding 65535 elements. Without LARGE the alloc is capped at 65535; at 8+ bytes per entry that's over 0.5MB of table, well past what most targets have.

Deleted entries become tombstones in the dense array. Compaction runs in-place when tombstones exceed 50% of live entries or when the array is full, no allocation needed so dict operations work safely under heap lock (exception handling, GC).

dict.popitem() returns LIFO matching CPython. OrderedDict shares the same hash table backing as regular dicts now, the mutable is_ordered linear-scan code path is removed. ROM/const maps keep linear scan via is_fixed.

The filled count for O(1) len() is packed into the same bitfield word as used so mp_obj_dict_t stays within one GC block. On 32-bit that gives 15 bits each (max 32767 entries), on 64-bit 31 bits.

Behind MICROPY_PY_MAP_ORDERED, defaulting on at EXTRA_FEATURES ROM level. With it off the original flat open-addressing table is used, zero overhead.

map->used changes meaning from live-entry count to high-water mark when ordered maps are enabled so external code reading dict length should use the mp_map_len() macro. The core code and port pin files are converted; there are ~28 kwargs->used instances across extmod/ and ports/ left unconverted since kwargs maps are add-only (used == filled always holds). Open question whether those should be converted for consistency.

Testing

Tested on unix (standard, coverage, minimal, nanbox, longlong), STM32 PYBV10 (cross-compiled), and PYBD-SF6W (on-device). Both MICROPY_PY_MAP_ORDERED=1 and =0 build and pass. Tests cover compaction thresholds, ordering preservation, empty-dict edge cases, non-qstr keys through hash rebuild, LIFO popitem with mixed del/add, dict operations under micropython.heap_lock(), and the alloc overflow boundary (stress/dict_create_max).

Flash measured by building branch and merge-base with identical toolchain (arm-none-eabi-gcc 14.3.1, -Os), comparing size on the firmware ELF:

Target Branch Merge base Delta
STM32 PYBV10 (Thumb-2) 367,676 367,356 +320 B (+0.09%)
Unix x86-64 783,774 783,062 +712 B (+0.09%)

RAM via gc.collect(); gc.mem_alloc() before/after dict creation on unix x86-64:

Dict size Branch Master Delta
0 (empty) 32 32 0
1 96 64 +32
10 224 192 +32
50 1,056 992 +64
100 2,208 2,080 +128
500 9,440 8,384 +1,056

Per-slot overhead is the hash index byte. Empty dict identical to master.

Speed on unix x86-64, time.ticks_us() inside functions, 1000 iterations on 100-entry dicts, median of 3 runs. Measured before the latest minor fixes (alloc guard, assert); these don't affect hot paths so numbers should still be representative:

Operation Branch (us) Master (us) Change
create 100 52,630 48,825 +7.8%
insert 100 61,199 55,885 +9.5%
lookup 100 48,389 49,209 -1.7%
iterate 100 30,972 30,813 +0.5%
del+add 100 275,700 835,207 -66.9%
globals rw 1M 146,271 137,767 +6.2%

Lookup and iteration within noise. Insertion ~10% slower from hash index writes, delete+add cycles ~3x faster (in-place compact vs full rehash). Globals access ~6% overhead.

On-device speed on PYBD-SF6W (STM32F767, Cortex-M7), time.ticks_us(), 500 iterations on 50-entry dicts, compared branch firmware against master on same board:

Operation Branch (us) Master (us) Change
create 50 423,988 431,240 -1.7%
lookup 50 1,527,215 1,582,656 -3.5%
iterate 50 432,025 417,258 +3.5%
del+add 50 1,126,618 1,869,892 -39.7%
globals 100K 413,621 426,807 -3.1%

On the MCU everything is within noise except delete+add cycles which are ~40% faster and iteration which is ~3.5% slower.

Trade-offs and Alternatives

+320 bytes flash on STM32 Thumb-2, mostly the compaction path needed for heap-locked safety. Ports at CORE/BASIC ROM level can disable MICROPY_PY_MAP_ORDERED for zero cost.

OrderedDict no longer accepts unhashable keys (e.g. slices) since it now uses the hash table, this matches CPython behavior. MicroPython's old linear-scan OrderedDict accepted them as a side effect of not hashing. The slice_optimise.py test is updated to handle both paths.

Generative AI

I used generative AI tools when creating this PR, but a human has checked the code and is responsible for the description above.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 26, 2026

Code size report:

Reference:  esp32/boards/SEEED_XIAO_ESP32C6: Add new XIAO board definition. [2dc2e30]
Comparison: tests: Add ordered dict tests and benchmarks. [merge of c1310aa]
  mpy-cross:   +64 +0.017% 
   bare-arm:   +36 +0.064% 
minimal x86:  +128 +0.068% 
   unix x64:  +680 +0.079% standard
      stm32:  +336 +0.084% PYBV10
      esp32:  +376 +0.022% ESP32_GENERIC
     mimxrt:  +328 +0.085% TEENSY40
        rp2:  +496 +0.054% RPI_PICO_W
       samd:  +328 +0.119% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:  +612 +0.134% VIRT_RV32

@andrewleech andrewleech force-pushed the py-map-ordered branch 8 times, most recently from 740bcf8 to fb2071d Compare March 29, 2026 11:40
dpgeorge and others added 3 commits March 31, 2026 11:35
Signed-off-by: Damien George <damien@micropython.org>
Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants