Skip to content

Add tutorial for KV cache compression with TurboQuant#438

Open
kacperlukawski wants to merge 4 commits intomainfrom
turboquant-tutorial
Open

Add tutorial for KV cache compression with TurboQuant#438
kacperlukawski wants to merge 4 commits intomainfrom
turboquant-tutorial

Conversation

@kacperlukawski
Copy link
Copy Markdown
Member

@kacperlukawski kacperlukawski commented Mar 30, 2026

This tutorial presents how to enable TurboQuant cache for HuggingFaceLocalChatGenerator models. It is based on turboquant-vllm, an unofficial implementation as Google hasn't released the official one yet.

Solves #437

@kacperlukawski kacperlukawski requested a review from a team as a code owner March 30, 2026 15:58
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -0,0 +1,242 @@
{
Copy link
Copy Markdown
Contributor

@bilgeyucel bilgeyucel Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Components Used:..

Goal: After completing this tutorial, you will have learned how to apply TurboQuant KV cache compression to a local LLM and measure its memory and throughput impact with Haystack.


Reply via ReviewNB

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, good catch! I created an issue to modify the template, so we have a proper terminology used: #443.

@@ -0,0 +1,242 @@
{
Copy link
Copy Markdown
Contributor

@bilgeyucel bilgeyucel Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave the outputs, especially when we print a result? I find these very useful


Reply via ReviewNB

Copy link
Copy Markdown
Contributor

@bilgeyucel bilgeyucel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left small comments, other than those, LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants