Skip to content

App deadlocks at launch when a Qwen3.5 GGUF is selected (bundled llama.framework hangs in ggml_metal_rsets_init) #262

@Sewagewaste

Description

@Sewagewaste

App deadlocks at launch when a Qwen3.5 GGUF is selected (bundled llama.framework hangs in ggml_metal_rsets_init)

Summary

On a fresh launch with cotabbySelectedModelFilename = Qwen3.5-2B-Q4_K_M.gguf, Cotabby's main thread deadlocks before the menu bar icon is created. No suggestions ever appear. With the model unselected (or the file moved aside), Cotabby launches normally — so the hang is triggered by the bundled llama.framework trying to load this specific model.

Same GGUF loads in ~2 seconds on the same machine using standalone llama.cpp (Homebrew build 9310 (e2ef8fe42)) and LM Studio's bundled llama-server (v2.16.0), Metal enabled in both. So the model file is fine — the bug is in Cotabby's bundled llama build.

Environment

  • Cotabby 0.1.1-beta (build 30), bundle id com.jacobfu.tabby
  • macOS 26.4.1 (25E253), Apple Silicon (M4), 16 GB
  • Bundled llama.framework/Versions/A/llama is 9.7 MB, fat (x86_64 + arm64). Only version string I can find in the binary is b8635075f — please confirm which upstream commit this maps to.
  • Engine: llamaOpenSource

Model

  • Repo: lmstudio-community/Qwen3.5-2B-GGUF
  • File: Qwen3.5-2B-Q4_K_M.gguf (1.27 GB)
  • GGUF v3, general.architecture = qwen35, 320 tensors, qwen35.context_length = 262144
  • The bundled llama binary already contains qwen35.cpp symbols, so model-arch dispatch isn't the blocker.

Reproduction

  1. Place Qwen3.5-2B-Q4_K_M.gguf in the Cotabby model folder.
  2. Open Cotabby → settings → Engine: Open Source, pick that model.
  3. Quit Cotabby. Relaunch.
  4. Menu bar icon never appears; app sits at ~248 MB RSS with all dispatch queues blocked.

If the file is moved out of the model folder before launch, Cotabby starts cleanly (≈77 MB RSS, menu bar icon present).

Stack (sampled with sample)

Main thread and every cooperative-queue task are blocked on a pthread mutex:

1653 Thread … DispatchQueue_1: com.apple.main-thread  (serial)
  completeTaskWithClosure → … (Cotabby) … →
  _pthread_mutex_firstfit_lock_slow → _pthread_mutex_firstfit_lock_wait → __psynch_mutexwait

The thread holding the lock is spinning forever in ggml_metal_rsets_init:

1661 Thread … DispatchQueue_13: com.apple.root.default-qos
  start_wqthread → _pthread_wqthread → _dispatch_worker_thread2 → _dispatch_root_queue_drain
  → _dispatch_client_callout → _dispatch_call_block_and_release
  → __ggml_metal_rsets_init_block_invoke  (in llama) + 116
  → usleep → nanosleep → __semwait_signal

(Full sample available on request.)

Side-by-side check that the model is fine

LM Studio's llama-server (v2.16.0, 5306f4b), same machine, Metal on, default args:

load_model: model loaded                      … in 0.84 s
prompt eval: 35.77 tok/s · eval: 60.30 tok/s

Stock Homebrew llama-cli 9310 (e2ef8fe42) also loads it without issue.

Likely cause / suggestion

The hang is in __ggml_metal_rsets_init_block_invoke busy-waiting on a semaphore that never gets signalled, while the main actor holds the lock it needs. This pattern shows up in older ggml-metal Resource Sets init paths when the Metal device init races with the loader thread. Most fixes I've seen for this came in much newer llama.cpp builds — Cotabby's bundled b8635075f may simply be older than the working fix.

Suggested fix: bump the bundled llama.framework to current upstream (the build I tested, 9310, is fine), and ideally clamp the loaded context length to a sane default (the model declares qwen35.context_length = 262144; if Cotabby is preallocating that, it adds memory pressure even when the hang itself is the primary bug).

Workaround for users on the current build

  • Do not select Qwen3.5 (or any post-b8635075f arch) until the framework is bumped.
  • Cotabby's recommended Qwen3-0.6B-Q4_K_M and Gemma-3-E2B-Q4_K_M load and run fine.

Metadata

Metadata

Assignees

Labels

area:perfCrashes, hangs, CPU pinning, slow response

Projects

Status

Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions