Show HN: Add semantic caching to LLM APIs with one-line-of-code

kentocloud.com

LLMcachingAI optimization

Discussion (1 comments)

Showing 1 comments

andreysheva

about 2 months ago

My AI bill was getting out of control from user queries and RAG workflows, a third of them generated the same output. I built Kento to fix it. It's a semantic cache that you add by just changing the base_url parameter in your OpenAI, Anthropic, or Google client. It catches duplicates (exact or semantic), serves instant responses, and gives you a dashboard to track hit rates and costs. Faster for your users, cheaper for you.

We have a free dev tier and would love your feedback.

View on Hacker News

Resources