Summary
The wq selftest (serial_test_wq) flakes on s390x because it waits only 50 microseconds (usleep(50)) for a workqueue callback to complete. The workqueue callback is scheduled via schedule_work() on system_wq (per-CPU bound), and the kworker thread may not be scheduled quickly enough on s390x to complete within 50 microseconds.
Failure Details
- Test / Component:
wq (serial_test_wq) in test_progs_no_alu32
- Frequency: Rare — observed in 1 of 8 examined runs, but only on s390x test_progs_no_alu32. The same test passed on s390x test_progs in the same CI run.
- Failure mode: Flaky —
ok_sleepable reads 0 instead of expected 2, meaning the workqueue callback never executed before the check.
- Affected architectures: s390x (observed), potentially any architecture under load
- CI runs observed:
Root Cause Analysis
The test serial_test_wq (tools/testing/selftests/bpf/prog_tests/wq.c:7) opens a BPF skeleton, runs test_syscall_array_sleepable (which calls bpf_wq_start to schedule a workqueue callback), then sleeps 50 microseconds and checks ok_sleepable.
The call chain is:
test_syscall_array_sleepable → test_elem_callback(&array, &key, wq_cb_sleepable) — initializes and starts the workqueue
bpf_wq_start (kernel/bpf/helpers.c:3177) → schedule_work(&w->work) — queues bpf_wq_work on system_wq
bpf_wq_work (kernel/bpf/helpers.c:1200) → runs wq_cb_sleepable → sets ok_sleepable |= (1 << 1)
The BPF program runs under migrate_disable() (from bpf_prog_run_pin_on_cpu), pinning execution to one CPU. The work is queued on that same CPU's system_wq worker pool. After the syscall returns, the kworker thread must be scheduled to process the work item.
On s390x, workqueue scheduling latency can exceed 50 microseconds, causing the test to read ok_sleepable before the callback has fired. The comment in the test says "10 usecs should be enough, but give it extra" — but 50 usecs is not enough margin.
The issue is likely exacerbated by the refactoring in 1bfbc267ec91 ("bpf: Enable bpf_timer and bpf_wq in any context"), which added atomic refcount operations (refcount_inc_not_zero, bpf_async_refcount_put) to the bpf_wq_start path, adding marginal overhead.
Proposed Fix
Replace usleep(50) with a polling loop that checks ok_sleepable every 1ms, up to 100ms total. This gives the workqueue callback ample time to complete while still exiting quickly in the common case (typically 1-2 iterations). See attached patch.
Impact
Without the fix, the wq test will continue to flake intermittently on s390x, causing false CI failures that waste developer time investigating unrelated test breakage.
References
tools/testing/selftests/bpf/prog_tests/wq.c:31 — the usleep(50) that is too short
kernel/bpf/helpers.c:3177 — bpf_wq_start function
kernel/bpf/helpers.c:1200 — bpf_wq_work workqueue callback
82e38a505c98 ("selftests/bpf: Fix wq test.") — original fix that acknowledged delayed callbacks
1bfbc267ec91 ("bpf: Enable bpf_timer and bpf_wq in any context") — refactoring that added overhead
Summary
The
wqselftest (serial_test_wq) flakes on s390x because it waits only 50 microseconds (usleep(50)) for a workqueue callback to complete. The workqueue callback is scheduled viaschedule_work()onsystem_wq(per-CPU bound), and the kworker thread may not be scheduled quickly enough on s390x to complete within 50 microseconds.Failure Details
wq(serial_test_wq) in test_progs_no_alu32ok_sleepablereads 0 instead of expected 2, meaning the workqueue callback never executed before the check.Root Cause Analysis
The test
serial_test_wq(tools/testing/selftests/bpf/prog_tests/wq.c:7) opens a BPF skeleton, runstest_syscall_array_sleepable(which callsbpf_wq_startto schedule a workqueue callback), then sleeps 50 microseconds and checksok_sleepable.The call chain is:
test_syscall_array_sleepable→test_elem_callback(&array, &key, wq_cb_sleepable)— initializes and starts the workqueuebpf_wq_start(kernel/bpf/helpers.c:3177) →schedule_work(&w->work)— queuesbpf_wq_workonsystem_wqbpf_wq_work(kernel/bpf/helpers.c:1200) → runswq_cb_sleepable→ setsok_sleepable |= (1 << 1)The BPF program runs under
migrate_disable()(frombpf_prog_run_pin_on_cpu), pinning execution to one CPU. The work is queued on that same CPU'ssystem_wqworker pool. After the syscall returns, the kworker thread must be scheduled to process the work item.On s390x, workqueue scheduling latency can exceed 50 microseconds, causing the test to read
ok_sleepablebefore the callback has fired. The comment in the test says "10 usecs should be enough, but give it extra" — but 50 usecs is not enough margin.The issue is likely exacerbated by the refactoring in
1bfbc267ec91("bpf: Enable bpf_timer and bpf_wq in any context"), which added atomic refcount operations (refcount_inc_not_zero,bpf_async_refcount_put) to thebpf_wq_startpath, adding marginal overhead.Proposed Fix
Replace
usleep(50)with a polling loop that checksok_sleepableevery 1ms, up to 100ms total. This gives the workqueue callback ample time to complete while still exiting quickly in the common case (typically 1-2 iterations). See attached patch.Impact
Without the fix, the
wqtest will continue to flake intermittently on s390x, causing false CI failures that waste developer time investigating unrelated test breakage.References
tools/testing/selftests/bpf/prog_tests/wq.c:31— theusleep(50)that is too shortkernel/bpf/helpers.c:3177—bpf_wq_startfunctionkernel/bpf/helpers.c:1200—bpf_wq_workworkqueue callback82e38a505c98("selftests/bpf: Fix wq test.") — original fix that acknowledged delayed callbacks1bfbc267ec91("bpf: Enable bpf_timer and bpf_wq in any context") — refactoring that added overhead