seq: improve seq_queue implementation and tests by ralphlange · Pull Request #8 · epics-modules/sequencer

ralphlange · 2026-04-26T18:37:18Z

Fix possible race condition in seq_queue,
where internal state was checked outside of a mutex.
Fixes unit test failure in MacOS builds.

Co-authored-by: google-labs-jules[bot]

anderslindho · 2026-04-27T06:30:45Z

Tested locally with macOS and arm64. I reproduced the failure 2/2 on queueTest.t with Bail out! 4589<=4587. I passed 1/1 with full runtests and 3/3 on queueTest.t specifically after the patch.

anderslindho · 2026-04-27T06:44:29Z

One follow-up comment. The queue status helpers do not take q->mutex. Most uses seem read-only/status-only, but seqQueueIsEmpty() is used by seq_pvGetQ. Should we maybe lock seqQueueIsEmpty() too, or all of seqQueueFree(), seqQueueUsed(), seqQueueIsEmpty(), seqQueueIsFull() for consistency?

simon-ess · 2026-04-27T07:27:40Z

I see the same behaviour as @anderslindho, and agree with his assessment of adding mutex locks.

There's also the other question: why does this fail on MacOS but not on various linuces?

ralphlange · 2026-04-27T07:29:35Z

There's also the other question: why does this fail on MacOS but not on various linuces?

Timing, I would think. These multi-core arms are just a whole bit faster.

ralphlange · 2026-04-27T07:37:09Z

... agree with his assessment of adding mutex locks.

Let me check a bit further in the commit history.

The original author(s?) were obviously very concerned about lock contention - the extra gymnastics that this PR removes are telling a story... In principle, read-only/status-only could be fine (off-by-one is ok in those cases), unless a race would produce crap data.

I agree that locking would increase consistency and beauty - just wondering about the price tag.

ralphlange · 2026-04-27T08:01:17Z

Commit ede72542

seq: complete redesign of broken queue adt

use size_t for all indices and sizes
state and ensure invariants
an additional boolean is needed to indicate overflow
added a mutex (used only when queue is overflowing)
added generic (unsafe) access methods

ralphlange · 2026-04-27T08:03:05Z

So the idea was to only lock in overflow - interesting.

Nice article on the topic (not on locks, though):
https://www.snellman.net/blog/archive/2016-12-13-ring-buffers

anderslindho · 2026-04-27T10:46:53Z

Indeed a nice article!

I just noticed that there is a header comment that will need updating if we accept this fix (seq_queue.h):

The implementation allows one reader and one writer to access the queue
without taking a mutex, except where unavoidable, i.e. when the queue is
full.

Re costs, haven't we taken the brunt of them already with the change in this PR?

simon-ess · 2026-04-27T10:48:43Z

I am sort of surprised at this. There is a mutex to block concurrent put/get when full which... fine? But why not concurrent gets or puts? Isn't that also a race condition? i.e. two puts at the same time would write to the same spot in memory with one of them being overwritten, no?

ralphlange · 2026-04-27T10:55:08Z

i.e. two puts at the same time would write to the same spot in memory with one of them being overwritten, no?

See the comment that Anders mentioned: It's an attempt to implement a lock-free "SPSC" (single producer, single consumer) queue.

simon-ess · 2026-04-27T13:46:42Z

Ah, I think I missed that it was a single producer and consumer. This does make more sense then.

simon-ess · 2026-04-29T08:20:40Z

As an aside, the test bail out message is wrong, which is sort of funny:

        if (j<i) {
            testAbort("%d<=%d", i, j);
        }

The i/j is backwards in the testAbort message.

(Actually, I guess what they're trying to say is that "i should be less than or equal to j but it is not", but it doesn't look that way in the log at all)

ralphlange · 2026-04-29T08:23:46Z

:-) Right.

Stay tuned - I will have an updated PR in a sec or two (or 14400).

- use epicsAtomic and memory barriers to allow concurrent get/put without a mutex when the queue is neither full nor empty - ensure all accesses to shared indices (rd, wr) are atomic to prevent data races and ensure consistency (fix MacOS issues) - extend queueTest with a high-iteration concurrent test (verify data integrity using bit negation and sequence tracking) Co-authored-by: google-labs-jules[bot]

ralphlange · 2026-04-29T18:40:33Z

@simon-ess, @anderslindho, @anjohnson:
Here is - cleaned-up - what Jules eventually came up with. Not bad at all.

anderslindho · 2026-05-05T13:28:20Z

Tested on arm64 macOS, and all 252 queueTest assertions pass.

One question: is the queue still SPSC, or are we changing the contract to MPSC? The Put comment says "to support Multi-Producer" but the Get comment says "Single-Consumer", and the header still describes a one-reader-one-writer design.

Honestly not sure this is better than your first revision... I'm not sure what we're gaining (or introducing) over just always locking.

ralphlange · 2026-05-05T15:33:46Z

The AI insisted that there are multiple producers, and I don't understand the sequencer code well enough to counterargue. What do you think?

I agree that the main gain still is pulling the lock in front of the index check (https://github.com/epics-modules/sequencer/pull/8/changes#diff-2d908f295da82c41ba4e921917956f1fe83821391cca5703230173a1f4f96a3bR136).
However, I like the use of epicsAtomic for things that need to be atomic (not just by chance). Also, as far as I understand, the memory barriers are actually needed for lock-free get, and we might just be lucky that no serious code resorting happened yet.

anderslindho · 2026-05-06T13:42:00Z

The AI insisted that there are multiple producers, and I don't understand the sequencer code well enough to counterargue. What do you think?

We can probably instruct another AI to argue against yours, if that's what you want. :)

TBH I don't feel confident speaking about singular or multiple producers for this code base as it stands, BUT seq_ca.c has an explicit comment suggesting that we needn't worry: "no need to lock against other writers, because named and anonymous PVs are disjoint". I would, generally, practice some extra caution when it comes to the confidence of an AI's statements. I will generally get a 2nd (AI) opinion.

I agree that the main gain is on pulling the lock in front of the index check (but I also think this failure on macOS+arm64 might not even matter), and I agree that using atomics where atomics are wanted always is a good idea.

simon-ess · 2026-05-06T14:12:39Z

Tbh if the sequencer code is not understood well enough by us I'm not comfortable with trusting an LLM to suggest changes like this.

If we have to make a choice I would prefer the mutex in front of the index check. If the issue is the additional cost of having the mutex, is this something we can run some benchmarks against?

simon-ess · 2026-05-06T14:33:33Z

As an example of this: If you remove all the epicsAtomics except the memory barriers the tests pass for me. Removing those as well makes it fail. So do we actually need all of this change?

I think that without one of us clearly understanding the issue and the correct fix, this seems unwise to merge in.

ralphlange · 2026-05-06T16:06:42Z

Well, there are two separate issues:

On arm, code is reordered more drastically on execution than with x86 processors. The memory barriers are correct, necessary and common for lock-free queue concepts. (This is what broke the macOS builds.)
The original code (checking index, then taking lock if deemed necessary) only works for single-producer queues. Gemini spots the access to the index outside the lock, which is usually a problem, with usually meaning having multiple producers. The code comment claims this queue to be single-producer. (And I tend to believe that, as there would have been problems much earlier if there were multiple producers.)

The indices in question are data types that are accessed atomically on usual processor architectures. However, for lock-free use they must be atomic, so using epicsAtomic is more explicit, i.e., IMHO better.

I'm leaning towards taking epicsAtomic for accessing indices and for the memory barriers, and leaving the lock as it was.

ralphlange requested review from anderslindho and simon-ess April 26, 2026 18:37

ralphlange added the help wanted Extra attention is needed label Apr 26, 2026

ralphlange mentioned this pull request Apr 26, 2026

Unit test queueTest fails consistently on MacOS #9

Open

ralphlange force-pushed the fix-queue branch from 6653296 to 6d169c5 Compare April 29, 2026 18:37

ralphlange changed the title ~~seq: fix possible race in seq_queue~~ seq: improve seq_queue implementation and tests Apr 29, 2026

ralphlange removed the help wanted Extra attention is needed label Apr 29, 2026

ralphlange linked an issue Apr 29, 2026 that may be closed by this pull request

Unit test queueTest fails consistently on MacOS #9

Open

Conversation

ralphlange commented Apr 26, 2026

Uh oh!

anderslindho commented Apr 27, 2026

Uh oh!

anderslindho commented Apr 27, 2026

Uh oh!

simon-ess commented Apr 27, 2026

Uh oh!

ralphlange commented Apr 27, 2026

Uh oh!

ralphlange commented Apr 27, 2026

Uh oh!

ralphlange commented Apr 27, 2026

Uh oh!

ralphlange commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anderslindho commented Apr 27, 2026

Uh oh!

simon-ess commented Apr 27, 2026

Uh oh!

ralphlange commented Apr 27, 2026

Uh oh!

simon-ess commented Apr 27, 2026

Uh oh!

simon-ess commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ralphlange commented Apr 29, 2026

Uh oh!

ralphlange commented Apr 29, 2026

Uh oh!

anderslindho commented May 5, 2026

Uh oh!

ralphlange commented May 5, 2026

Uh oh!

anderslindho commented May 6, 2026

Uh oh!

simon-ess commented May 6, 2026

Uh oh!

simon-ess commented May 6, 2026

Uh oh!

ralphlange commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ralphlange commented Apr 27, 2026 •

edited

Loading

simon-ess commented Apr 29, 2026 •

edited

Loading

ralphlange commented May 6, 2026 •

edited

Loading