Add tutorial for KV cache compression with TurboQuant by kacperlukawski · Pull Request #438 · deepset-ai/haystack-tutorials

kacperlukawski · 2026-03-30T15:58:12Z

This tutorial presents how to enable TurboQuant cache for HuggingFaceLocalChatGenerator models. It is based on turboquant-vllm, an unofficial implementation as Google hasn't released the official one yet.

Solves #437

review-notebook-app · 2026-03-30T15:58:18Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

bilgeyucel · 2026-04-02T15:39:41Z

tutorials/49_TurboQuant_Quantization_with_HuggingFace.ipynb

@@ -0,0 +1,242 @@
+{


Components Used:..
Goal: After completing this tutorial, you will have learned how to apply TurboQuant KV cache compression to a local LLM and measure its memory and throughput impact with Haystack.

Reply via ReviewNB

Right, good catch! I created an issue to modify the template, so we have a proper terminology used: #443.

bilgeyucel · 2026-04-02T15:39:41Z

tutorials/49_TurboQuant_Quantization_with_HuggingFace.ipynb

@@ -0,0 +1,242 @@
+{


Can you leave the outputs, especially when we print a result? I find these very useful

Reply via ReviewNB

bilgeyucel

Left small comments, other than those, LGTM!

kacperlukawski requested a review from a team as a code owner March 30, 2026 15:58

kacperlukawski requested a review from bilgeyucel March 30, 2026 15:58

kacperlukawski added 3 commits March 31, 2026 11:56

Add tutorial for KV cache compression with TurboQuant

65416cd

Make it clear that we use unofficial turboquant implementation

0b86083

Sspecify Python version in tutorial configuration

abc01d4

kacperlukawski force-pushed the turboquant-tutorial branch from d6ca261 to abc01d4 Compare March 31, 2026 09:57

Remove HF_TOKEN ref

47a33b0

bilgeyucel reviewed Apr 2, 2026

View reviewed changes

bilgeyucel approved these changes Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tutorial for KV cache compression with TurboQuant#438

Add tutorial for KV cache compression with TurboQuant#438
kacperlukawski wants to merge 4 commits intomainfrom
turboquant-tutorial

kacperlukawski commented Mar 30, 2026 •

edited

Loading

Uh oh!

review-notebook-app bot commented Mar 30, 2026

Uh oh!

bilgeyucel Apr 2, 2026 •

edited

Loading

Uh oh!

kacperlukawski Apr 3, 2026

Uh oh!

bilgeyucel Apr 2, 2026 •

edited

Loading

Uh oh!

bilgeyucel left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kacperlukawski commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

review-notebook-app bot commented Mar 30, 2026

Uh oh!

bilgeyucel Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kacperlukawski Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

bilgeyucel Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bilgeyucel left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kacperlukawski commented Mar 30, 2026 •

edited

Loading

bilgeyucel Apr 2, 2026 •

edited

Loading

bilgeyucel Apr 2, 2026 •

edited

Loading