Bump llama.cpp to b9310 to fix Metal deadlock on macOS 26#2
Merged
Conversation
The bundled llama.framework at b8665 deadlocks at model load on Apple Silicon under macOS 26 (Tahoe): the main thread blocks on a pthread mutex while ggml-metal's residency-set keep-alive thread spins in __semwait_signal. Reproduces deterministically with Qwen3.5-2B-Q4_K_M on an M4 / macOS 26.4.1. The underlying fix landed upstream around b8882 (ggml-org/llama.cpp#20141): AGX_RELAX_CDM_CTXSTORE_TIMEOUT is set unconditionally inside ggml-metal, preventing the IOGPU command-buffer timeout that left the loader stuck. b8665 predates that fix. b9310 was validated working on the reporter's hardware. swift build + swift test pass against b9310; the C++ wrapper compiles without changes, so CotabbyInferenceEngine.h ABI is unchanged and downstream Cotabby picks this up with no further edits.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bumps the bundled
llama-cppbinary target from b8665 to b9310 (URL + checksum update inPackage.swift).Closes the upstream deadlock that surfaces as Cotabby issue #262: launching Cotabby with
Qwen3.5-2B-Q4_K_M.ggufselected hangs the main thread before the menu bar icon appears. Reproduces deterministically on Apple Silicon under macOS 26 (Tahoe).Root cause
The bundled
b8665Metal backend hits a Tahoe-era IOGPU command-buffer timeout regression. Sampled stack from the affected user:loadModelto finish.com.apple.root.default-qosworker spinning forever inside__ggml_metal_rsets_init_block_invoke→usleep→nanosleep→__semwait_signal. That is ggml-metal's residency-set keep-alive heartbeat (introduced in ggml-org/llama.cpp#17766); by design it spins, so the actual hang is the command buffer the loader is waiting on never completing.Upstream tracked the symptom in ggml-org/llama.cpp#20141 (identical hardware: M4 / macOS 26). The fix was to set
AGX_RELAX_CDM_CTXSTORE_TIMEOUT=1unconditionally inside ggml-metal, which landed around build b8882.b8665predates that fix. The issue reporter validated b9310 (e2ef8fe42) working on their machine via standalonellama-cli, so that is the target chosen here.Validation
swift build(Xcode 26 toolchain): clean.swift test: 9 passing, 1 skipped (testEndToEndWithModel, requiresCOTABBY_TEST_MODEL_PATH). The skipped test is the only one that exercises real model load; reviewers with a local GGUF can run it viaCOTABBY_TEST_MODEL_PATH=/path/to/model.gguf swift test.CotabbyInferenceEngine.cpp) compiles unchanged against b9310, soCotabbyInferenceEngine.h's ABI is unchanged. Downstream Cotabby picks this up on the next package resolve with no source-side edits.Risk / rollout
~675upstream build numbers (~6 weeks) between b8665 and b9310. No public API removals in that window that the wrapper touches (verified by clean compile ofCotabbyInferenceEngine.cppand full unit test pass).CotabbyInferencetobranch: mainin itsproject.yml, so merging this PR ships the fix to all Cotabby users on the next build/release without a Cotabby-side change.Linked