Skip to content

Fix #946: Add standard iterator traits to all CharPointer types for STL algorithm compatibility#1562

Open
killerdevildog wants to merge 1 commit intojuce-framework:masterfrom
killerdevildog:fix-issue-946
Open

Fix #946: Add standard iterator traits to all CharPointer types for STL algorithm compatibility#1562
killerdevildog wants to merge 1 commit intojuce-framework:masterfrom
killerdevildog:fix-issue-946

Conversation

@killerdevildog
Copy link
Copy Markdown

@killerdevildog killerdevildog commented Aug 4, 2025

Fixes #946: Add standard iterator traits to all CharPointer types for compatibility with std algorithms

Changes

Adds the five standard iterator traits (difference_type, value_type, pointer, reference, iterator_category) to all four CharPointer types:

  • CharPointer_UTF8
  • CharPointer_UTF16
  • CharPointer_UTF32
  • CharPointer_ASCII

This ensures that String::begin() / String::end() work with standard library algorithms (e.g. std::all_of, std::find_if, std::any_of) regardless of the JUCE_STRING_UTF_TYPE setting (8, 16, or 32).

Design Decisions

  • std::input_iterator_tag is used for all types. UTF-8 and UTF-16 are variable-width encodings, so operator+(n) is not O(1) — random_access_iterator_tag or bidirectional_iterator_tag would misrepresent the performance contract. input_iterator_tag is the most honest category and is sufficient for the algorithms users need (the exact scenario from Some custom iterators are incompatible with standard algorithms #946).
  • reference = juce_wchar (not juce_wchar&) because operator*() returns by value in all CharPointer types — these are proxy iterators that decode the underlying encoding into juce_wchar.
  • difference_type = std::ptrdiff_t — the standard choice.
  • value_type = juce_wchar — consistent across all CharPointer types since they all decode to juce_wchar.

Testing

  • Verified the exact repro from Some custom iterators are incompatible with standard algorithms #946 compiles with GCC and Clang
  • Tested with std::all_of, std::find_if, and other STL algorithms
  • Tested edge cases: empty strings, multi-byte UTF-8 characters
  • static_assert verified that std::iterator_traits resolves correctly for all four types
  • Fully backward compatible — no existing APIs changed

Related

@jrlanglois
Copy link
Copy Markdown

jrlanglois commented Dec 29, 2025

This adds iterator traits to CharPointer_UTF8 only. If a user of JUCE were to change JUCE_STRING_UTF_TYPE to 16 or 32, these changes as-is would break the build for them (if they were to try and use STL algos). Also, iterator semantics will differ: UTF-16 is variable-width for some characters, and UTF-32 is fixed-width.

Another practical concern is the use of random_access_iterator_tag. UTF-8 is variable-width, so advancing or indexing doesn’t correspond 1:1 outside pure ASCII. For example, a single visible character like ë or an emoji occupies multiple bytes, so it + n or indexing doesn’t correspond to constant-time character access in the usual STL sense.

Anyway, this is why getAndAdvance exists.

@killerdevildog killerdevildog changed the title Fix #946: Add standard iterator traits to CharPointer_UTF8 for compat… Fix #946: Add standard iterator traits to all CharPointer types for STL algorithm compatibility Feb 22, 2026
…er types for STL algorithm compatibility

Add the five standard iterator traits (difference_type, value_type, pointer,
reference, iterator_category) to all four CharPointer types:

- CharPointer_UTF8
- CharPointer_UTF16
- CharPointer_UTF32
- CharPointer_ASCII

Uses std::input_iterator_tag, which correctly reflects that operator*()
returns juce_wchar by value (proxy iterator) and that variable-width
encodings (UTF-8, UTF-16) do not support O(1) random access.

This ensures String::begin()/end() work with standard library algorithms
(e.g. std::all_of, std::find_if) regardless of the JUCE_STRING_UTF_TYPE
setting.

Fixes juce-framework#946
@killerdevildog
Copy link
Copy Markdown
Author

@jrlanglois Thanks for the thorough review! I've addressed both of your concerns:

1. Coverage of all CharPointer types
Iterator traits are now added to all four types: CharPointer_UTF8, CharPointer_UTF16, CharPointer_UTF32, and CharPointer_ASCII. Changing JUCE_STRING_UTF_TYPE to 16 or 32 will no longer break STL algorithm usage, since String::begin()/end() returns whichever CharPointerType is configured (see juce_String.h lines 194-199).

2. Iterator category
Switched from random_access_iterator_tag to std::input_iterator_tag for all types. As you correctly pointed out, UTF-8 and UTF-16 are variable-width, so operator+(n) is not O(1) — random_access_iterator_tag would violate the STL performance contract. input_iterator_tag is the most honest choice and is sufficient for the algorithms in the original issue (#946): std::all_of, std::find_if, std::any_of, std::count_if, etc.

Additionally, reference is defined as juce_wchar (not juce_wchar&) since operator*() returns by value in all CharPointer types — these are proxy iterators that decode the underlying encoding.

The branch has been rebased onto the latest upstream master and squashed into a single commit. All changes verified to compile cleanly with GCC and Clang, including static_assert checks that std::iterator_traits resolves correctly for all four types.

Ref: #946

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Some custom iterators are incompatible with standard algorithms

2 participants