Skip to content

Fix Kimi-k2 checkpoint conversion#3528

Merged
copybara-service[bot] merged 1 commit intomainfrom
agagik-kimi-checkpoint
Apr 6, 2026
Merged

Fix Kimi-k2 checkpoint conversion#3528
copybara-service[bot] merged 1 commit intomainfrom
agagik-kimi-checkpoint

Conversation

@gagika
Copy link
Copy Markdown
Collaborator

@gagika gagika commented Mar 31, 2026

Description

This PR updates the DeepSeek checkpoint conversion script to safely ignore HuggingFace weights that lack a corresponding MaxText mapping, preventing crashes during conversion.

Context & Problem: The conversion script was failing with KeyError: 'model.layers.0.self_attn.rotary_emb.inv_freq'. Because MaxText computes certain weights (like RoPE frequencies) on the fly, they are intentionally excluded from the HF-to-MaxText mapping dictionary. The previous strict dictionary indexing crashed when it encountered these expected missing keys.

Implementation & Solution:

  • Replaced strict dictionary access with .get(key) so unmapped weights return None instead of throwing an exception.
  • Wrapped the tensor assignment in an if mapped_key: block to safely bypass weights that don't need conversion.

This solution makes the conversion script more robust by gracefully handling dynamically computed weights and any unexpected extra variables in the checkpoint.

Tests

  • Ran convert_deepseek_family_unscanned_ckpt.py on the target Kimi-k2 HuggingFace checkpoint.
  • Verified that the script completes successfully and bypasses rotary_emb.inv_freq without throwing a KeyError.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 31, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 2, 2026

🤖 Hi @gagika, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

@gagika gagika force-pushed the agagik-kimi-checkpoint branch from 0239f6b to 2ff39d3 Compare April 2, 2026 21:59
Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## 📋 Review Summary

This PR addresses a KeyError during Kimi-k2 checkpoint conversion by gracefully handling keys that are present in the checkpoint but not in the MaxText mapping. This change improves robustness for model variants with unexpected weight names or metadata.

🔍 General Feedback

  • The use of .get(key) and checking for None is a standard defensive pattern for weight mapping scripts.
  • This fix aligns the unscanned script with the main DeepSeek conversion script which already includes this pattern.

@Perseus14
Copy link
Copy Markdown
Contributor

LGTM!

Tested on Kimi-K2-1t-Instruct-0905 model on m4-ultramem-224 machine.

mapped_key = ds_ckpt.hf_to_maxtext_mapping(
layer, num_experts, first_num_dense_layers, base_num_decoder_layers
).get(key)
if mapped_key:
Copy link
Copy Markdown
Collaborator

@RissyRan RissyRan Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add else branch for debugging in the future?

if mapped_key:
                chkpt_vars[mapped_key] = f.get_tensor(key)
else:
                # This catches keys that are allowed but missing from the mapping dictionary
                print(f"[DEBUG] Key allowed but no mapping found: {key}")

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, done

Copy link
Copy Markdown
Collaborator

@RissyRan RissyRan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit comment, LGTM!

@gagika gagika force-pushed the agagik-kimi-checkpoint branch from 2ff39d3 to fb99a79 Compare April 6, 2026 17:19
@copybara-service copybara-service bot merged commit 5d16abb into main Apr 6, 2026
43 checks passed
@copybara-service copybara-service bot deleted the agagik-kimi-checkpoint branch April 6, 2026 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants