Product Launch
anonymous
2 points
1 comments
Postedabout 2 months agoActiveabout 2 months ago
Show HN: Add semantic caching to LLM APIs with one-line-of-code
kentocloud.comLLMcachingAI optimization
Discussion (1 comments)
Showing 1 comments
about 2 months ago
My AI bill was getting out of control from user queries and RAG workflows, a third of them generated the same output. I built Kento to fix it. It's a semantic cache that you add by just changing the base_url parameter in your OpenAI, Anthropic, or Google client. It catches duplicates (exact or semantic), serves instant responses, and gives you a dashboard to track hit rates and costs.
Faster for your users, cheaper for you.
We have a free dev tier and would love your feedback.