Skip to content
This repository was archived by the owner on Mar 4, 2026. It is now read-only.

Commit 2c4268b

Browse files
munimxCopilot
andcommitted
deprecate: add deprecation notice, redirect to llm-semantic-cache
This project is superseded by munimx/llm-semantic-cache. The vLLM proxy direction has been retired — see PROJECT_DIRECTION2.md. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent c341f53 commit 2c4268b

1 file changed

Lines changed: 15 additions & 2 deletions

File tree

README.md

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,21 @@
11
# LLM Inference Optimization Engine
22

3-
OpenAI-compatible inference middleware for [vLLM](https://github.com/vllm-project/vllm).
3+
> **⚠️ DEPRECATED** — This project is archived and no longer maintained.
4+
> Development has moved to **[munimx/llm-semantic-cache](https://github.com/munimx/llm-semantic-cache)**.
45
5-
Sits between your application and one or more vLLM instances. Each request flows through Redis-backed response caching, cross-worker request coalescing, token-count-based model routing, and KV-cache-pressure-aware admission control before being dispatched to the backend pool.
6+
---
7+
8+
## What This Was
9+
10+
An OpenAI-compatible proxy middleware for [vLLM](https://github.com/vllm-project/vllm). It sat between an application and one or more vLLM instances, adding Redis-backed response caching, cross-worker request coalescing, token-count-based model routing, and KV-cache-pressure-aware admission control.
11+
12+
## Why It Was Retired
13+
14+
This project went through two iterations. The first was Ollama middleware — a foundation that turned out to be wrong. The second (this repo) was a vLLM proxy, which was cleaner but ultimately a worse version of [LiteLLM](https://github.com/BerriAI/litellm). LiteLLM already exists, is production-grade, has enterprise backing, and covers every feature here plus hundreds more. There is no credible answer to "why not just use LiteLLM?" for a generic proxy layer.
15+
16+
## Where Development Continues
17+
18+
**[munimx/llm-semantic-cache](https://github.com/munimx/llm-semantic-cache)** — a focused Python library that adds semantic caching in front of any OpenAI-compatible LLM API. One thing done well: understand whether two prompts are asking the same thing, and skip the redundant LLM call if they are.
619

720
## Quick Start
821

0 commit comments

Comments
 (0)