Skip to content

Bump llama.cpp to b9310 to fix Metal deadlock on macOS 26#2

Merged
FuJacob merged 1 commit into
mainfrom
bump/llama-b9310
May 28, 2026
Merged

Bump llama.cpp to b9310 to fix Metal deadlock on macOS 26#2
FuJacob merged 1 commit into
mainfrom
bump/llama-b9310

Conversation

@FuJacob
Copy link
Copy Markdown
Owner

@FuJacob FuJacob commented May 28, 2026

Summary

Bumps the bundled llama-cpp binary target from b8665 to b9310 (URL + checksum update in Package.swift).

Closes the upstream deadlock that surfaces as Cotabby issue #262: launching Cotabby with Qwen3.5-2B-Q4_K_M.gguf selected hangs the main thread before the menu bar icon appears. Reproduces deterministically on Apple Silicon under macOS 26 (Tahoe).

Root cause

The bundled b8665 Metal backend hits a Tahoe-era IOGPU command-buffer timeout regression. Sampled stack from the affected user:

  • Main thread blocked on a pthread mutex waiting for loadModel to finish.
  • com.apple.root.default-qos worker spinning forever inside __ggml_metal_rsets_init_block_invokeusleepnanosleep__semwait_signal. That is ggml-metal's residency-set keep-alive heartbeat (introduced in ggml-org/llama.cpp#17766); by design it spins, so the actual hang is the command buffer the loader is waiting on never completing.

Upstream tracked the symptom in ggml-org/llama.cpp#20141 (identical hardware: M4 / macOS 26). The fix was to set AGX_RELAX_CDM_CTXSTORE_TIMEOUT=1 unconditionally inside ggml-metal, which landed around build b8882. b8665 predates that fix. The issue reporter validated b9310 (e2ef8fe42) working on their machine via standalone llama-cli, so that is the target chosen here.

Validation

  • swift build (Xcode 26 toolchain): clean.
  • swift test: 9 passing, 1 skipped (testEndToEndWithModel, requires COTABBY_TEST_MODEL_PATH). The skipped test is the only one that exercises real model load; reviewers with a local GGUF can run it via COTABBY_TEST_MODEL_PATH=/path/to/model.gguf swift test.
  • The C++ wrapper (CotabbyInferenceEngine.cpp) compiles unchanged against b9310, so CotabbyInferenceEngine.h's ABI is unchanged. Downstream Cotabby picks this up on the next package resolve with no source-side edits.

Risk / rollout

  • Single-line URL + checksum bump. Reversion is trivial.
  • ~675 upstream build numbers (~6 weeks) between b8665 and b9310. No public API removals in that window that the wrapper touches (verified by clean compile of CotabbyInferenceEngine.cpp and full unit test pass).
  • Cotabby pins CotabbyInference to branch: main in its project.yml, so merging this PR ships the fix to all Cotabby users on the next build/release without a Cotabby-side change.

Linked

The bundled llama.framework at b8665 deadlocks at model load on Apple
Silicon under macOS 26 (Tahoe): the main thread blocks on a pthread
mutex while ggml-metal's residency-set keep-alive thread spins in
__semwait_signal. Reproduces deterministically with Qwen3.5-2B-Q4_K_M
on an M4 / macOS 26.4.1.

The underlying fix landed upstream around b8882
(ggml-org/llama.cpp#20141): AGX_RELAX_CDM_CTXSTORE_TIMEOUT is set
unconditionally inside ggml-metal, preventing the IOGPU command-buffer
timeout that left the loader stuck. b8665 predates that fix. b9310 was
validated working on the reporter's hardware.

swift build + swift test pass against b9310; the C++ wrapper compiles
without changes, so CotabbyInferenceEngine.h ABI is unchanged and
downstream Cotabby picks this up with no further edits.
@FuJacob FuJacob merged commit facf5b3 into main May 28, 2026
1 check passed
@FuJacob FuJacob deleted the bump/llama-b9310 branch May 28, 2026 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant