Skip to content

Add LTXAV model detection monkey-patch for checkpoint compatibility#482

Open
Marcel (goodguy1963) wants to merge 1 commit into
Lightricks:masterfrom
goodguy1963:fix/ltxav-model-detection-patch
Open

Add LTXAV model detection monkey-patch for checkpoint compatibility#482
Marcel (goodguy1963) wants to merge 1 commit into
Lightricks:masterfrom
goodguy1963:fix/ltxav-model-detection-patch

Conversation

@goodguy1963
Copy link
Copy Markdown

Problem

Loading LTXAV GGUF checkpoints fails with RuntimeError: size mismatch because the ComfyUI core model detection doesn't auto-detect these parameters from the checkpoint:

  • cross_attention_adaln — checkpoint uses 9 ada params per block, core defaults to 6
  • Audio cross-attention dim & head dim — not detected from audio_attn2 weights
  • Embeddings connector dimensions — defaults to 30 heads × 128 dim = 3840, but checkpoints use 32 heads with video=4096 and audio=2048

Fix

model_detection_patch.py — a zero-touch monkey-patch applied at ComfyUI-LTXVideo import time that wraps comfy.model_detection.detect_unet_config. When an LTXAV model is detected, it reads the missing parameters from the state dict keys:

Parameter Detection method
cross_attention_adaln prompt_scale_shift_table key existence
audio_cross_attention_dim audio_attn2.to_k.weight shape[1]
audio_attention_head_dim audio_attn2.to_k.weight shape[0] ÷ 32
Connector dims learnable_registers shape[1] ÷ 32

No core ComfyUI files are modified — the patch lives entirely within the custom node and activates via a one-line import in __init__.py.

Testing

Verified model creation produces exact checkpoint-matching shapes:

Block 0: scale_shift=[9, 4096] ✓, audio_scale_shift=[9, 2048] ✓
Video connector register: [128, 4096] ✓
Audio connector register: [128, 2048] ✓

Monkey-patches comfy.model_detection.detect_unet_config to auto-detect missing LTXAV parameters from the checkpoint:

- cross_attention_adaln (from prompt_scale_shift_table existence)
- audio_cross_attention_dim & audio_attention_head_dim (from audio_attn2 weights)
- Embeddings connector dimensions (from learnable_registers shapes)

Fixes 'size mismatch' RuntimeError when loading LTXAV GGUF checkpoints that were trained with cross_attention_adaln=True and different connector head dimensions than the core defaults.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant