A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.
-
Updated
Jun 30, 2025 - Python
A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.
Redis Vector Library (RedisVL) -- the AI-native Python client for Redis.
mimir is a drop-in proxy that caches LLM API responses using semantic similarity, reducing costs and latency for repeated or similar queries.
SmarterRouter: An intelligent LLM gateway and VRAM-aware router for Ollama, llama.cpp, and OpenAI. Features semantic caching, model profiling, and automatic failover for local AI labs.
Reliable and Efficient Semantic Prompt Caching with vCache
One API for 25+ LLMs, OpenAI, Anthropic, Bedrock, Azure. Caching, guardrails & cost controls. Go-native LiteLLM & Kong AI Gateway alternative.
Redis Vector Library (RedisVL) -- the AI-native Java client for Redis.
This is a RAG based chatbot in which semantic cache and guardrails have been incorporated.
This repository contains sample code demonstrating how to implement a verified semantic cache using Amazon Bedrock Knowledge Bases to prevent hallucinations in Large Language Model (LLM) responses while improving latency and reducing costs.
High-performance LLM query cache with semantic search. Reduce API costs 80% and latency from 8.5s to 1ms using Redis + Qdrant vector DB. Multi-provider support (OpenAI, Anthropic).
Enhance LLM retrieval performance with Azure Cosmos DB Semantic Cache. Learn how to integrate and optimize caching strategies in real-world web applications.
Ultra-fast Semantic Cache Proxy written in pure C
Redis Vector Similarity Search, Semantic Caching, Recommendation Systems and RAG
VCAL Core — high-performance semantic cache and vector cache library for LLM applications.
A ChatBot using Redis Vector Similarity Search, which can recommend blogs based on user prompt
Optimized RAG Retrieval with Indexing, Quantization, Hybrid Search and Caching
Adaptive semantic cache for LLMs with streaming support, ML-based thresholds, and real-time cost tracking. Built in Rust for sub-millisecond performance.
An operating system for autonomous AI agents — 5-tier cache-first routing (97.5% cost reduction), Ed25519 constitution enforcement, 130 agents, 106 plugins. Rust.
Semantic cache layer for LLM APIs — embed prompts locally, find near-matches, skip redundant LLM calls.
Add a description, image, and links to the semantic-cache topic page so that developers can more easily learn about it.
To associate your repository with the semantic-cache topic, visit your repo's landing page and select "manage topics."