Skip to content

[bpf-ci-bot] sock_iter_batch/udp test flakes on s390x cpuv4  #454

@kernel-patches-review-bot

Description

@kernel-patches-review-bot

The sock_iter_batch/udp test flakes on s390x cpuv4 (and potentially other architectures) because the kernel's UDP auto-port assignment can assign the same port to both SO_REUSEPORT socket groups. When ports[0] == ports[1], the BPF program assigns idx=0 to all sockets, causing the second_idx assertion to fail in the second read loop.

Failure Details

  • Test / Component: sock_iter_batch/udp (prog_tests/sock_iter_batch.c do_test())
  • Frequency: Observed in baseline CI (not PR-specific); likely rare but reproducible when port auto-assignment collides
  • Failure mode: Wrong result (flaky) — second_idx: actual 0 != expected 1
  • Affected architectures: s390x cpuv4 observed; any architecture possible depending on port randomization
  • CI runs observed:

Root Cause Analysis

The test creates two groups of 4 SO_REUSEPORT UDP sockets (do_test at prog_tests/sock_iter_batch.c:866), each binding to auto-assigned port 0 via start_reuseport_server(). The auto-assigned ports are stored in skel->rodata->ports[0] and skel->rodata->ports[1] (line 890).

The kernel's udp_lib_lport_inuse() (net/ipv4/udp.c:141) builds a bitmap of "in use" ports for auto-assignment. However, when both the existing and new sockets have SO_REUSEPORT set and share the same UID, the existing port's bit is not set in the bitmap (lines 158-163):

if (sk2->sk_reuseport && sk->sk_reuseport &&
    !rcu_access_pointer(sk->sk_reuseport_cb) &&
    uid_eq(uid, sk_uid(sk2))) {
    if (!bitmap)
        return 0;
    /* bit NOT set — port appears available */
}

This allows the second group's bind(port=0) to receive the same port as the first group.

When ports[0] == ports[1], the BPF program iter_udp_soreuse (progs/sock_iter_batch.c:115-118) always takes the idx=0 branch (since sk->sk_num == ports[0] is checked first and is true for ALL sockets). The test then:

  1. Reads 3 sockets with idx=0 in the first loop → first_idx=0
  2. Closes fds[first_idx] (first group)
  3. Sets second_idx = !first_idx = 1
  4. Reads remaining sockets — still idx=0 (since ports[0]==ports[1])
  5. Assertion ASSERT_EQ(outputs[i].idx, second_idx) fails: 0 != 1

The existing bucket collision guard at line 957 only protects the total_read assertion, not the second_idx assertion at line 947.

Proposed Fix

Add an assertion that the two auto-assigned ports are distinct, immediately after socket creation (prog_tests/sock_iter_batch.c:891). See attached patch: 0001-selftests-bpf-fix-sock_iter_batch-flake-due-to-SO_RE.patch.

This converts the cryptic second_idx failure into a clear distinct_ports assertion, making the root cause immediately obvious. The collision is inherent to SO_REUSEPORT + auto-port-assignment and cannot be prevented at the test level without using explicit ports.

Impact

Without the fix, the sock_iter_batch/udp subtest intermittently fails in CI with a misleading second_idx error that looks like a BPF iterator bug. This creates noise in CI results and wastes developer time investigating iterator logic when the actual issue is port auto-assignment behavior.

References

  • net/ipv4/udp.c:141udp_lib_lport_inuse() SO_REUSEPORT bitmap skip
  • tools/testing/selftests/bpf/prog_tests/sock_iter_batch.c:866do_test()
  • tools/testing/selftests/bpf/progs/sock_iter_batch.c:115 — BPF program port comparison
  • Commit dbd7db7787ba ("selftests/bpf: Test udp and tcp iter batching") — original test

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions