Asterisk AI
github.comKey Features
Tech Stack
Key Features
Tech Stack
Where is the difference between this and Indian support staff pretending to be in your vicinity by telling you about the local weather? Your version is arguably even worse because it can plausibly fool people more competently.
So you're telling the caller that it is an AI, and yet you can have a pleasant background audio experience.
Example of legit calls: the pizza delivery guy decided to call my phone instead of ringing the bell, for whatever reason.
No, it does not cost over thirty dollars to allow someone accused to call their loved ones. We pay taxes. I want my government to use the taxes and provide these calls for free.
The interface is so inconsistent between different implementations that they're always terribly awkward to navigate at best, and completely infuriating at worst. I don't like presenting the image of an progressively-angrier man who is standing around and speaking incongruous short phrases that are clearly directed towards nobody at all.
But I've found that many of them still accept DTMF. Just mash a button instead of utter a response, and a more-traditional IVR tree shows up with a spoken list of enumerated options. Things get a lot better after that.
Like pushing buttons at the gas pump to try to silence the ad-roll, it's pretty low-cost to try.
When you’re committed to phone intent complexity (hell), the AI assisted options are sort of less bad since you don’t have to explain the menu to callers, they just make demands.
Sort of like how Jira can be a streamlined tool or a prison of 50-step workflows, it's all up to the designer.
- a client were working does advertising in TV commercials, and a few percent of their calls is people trying to cancel their TV subscriptions, even though they are in healthcare - in the troubleshooting flow for a client with a physical product, 40% of calls are resolved after the “did you try turning it off and on again” step. - a health insurance client has 25% of call volume for something that is available self-service (and very visible as well), yet people still call. - a client in the travel space gets a lot of calls about: “does my accommodation include X”, and employees just use their public website to answer those questions. (I.e., it’s clearly available for self-service)
One of the things we tend to prioritize in the initial conversation is to determine in which segment you fall and route accordingly.
Im in this business, and used to think the same. It turns out this is a minority of callers. Some examples:
- a client were working does advertising in TV commercials, and a few percent of their calls is people trying to cancel their TV subscriptions, even though they are in healthcare
I guess these are probably desperate people who are trying to get to someone, anyone. In my opinion, the best thing people can do is get a really good credit card and do a charge back for things like this.
- in the troubleshooting flow for a client with a physical product, 40% of calls are resolved after the “did you try turning it off and on again” step.
I bought a Chinese wifi mesh router and it literally finds a time between two am and five am and reboots itself every night, by default. You can turn this behavior off but it was interesting that it does this by default.
- a health insurance client has 25% of call volume for something that is available self-service (and very visible as well), yet people still call.
In my defense, I've been on the other side of this. I try to avoid calling but whenever I use self service, it feels like ny settings never stick and always switch back to what they want the next billing cycle. If I have to waste time each month, you have to waste time each month.
- a client in the travel space gets a lot of calls about: “does my accommodation include X”, and employees just use their public website to answer those questions. (I.e., it’s clearly available for self-service)
These public websites are regularly out of date. Someone who is actually on site confirm that yes, they have non smoking rooms or ice machines that aren't broken is valuable.
One of the things we tend to prioritize in the initial conversation is to determine in which segment you fall and route accordingly.
(If you do need SIP, this Asterisk project looks really great.)
Pipecat has 90 or so integrations with all the models/services people use for voice AI these days. NVIDIA, AWS, all the foundation labs, all the voice AI labs, most of the video AI labs, and lots of other people use/contribute to Pipecat. And there's lots of interesting stuff in the ecosystem, like the open source, open data, open training code Smart Turn audio turn detection model [2], and the Pipecat Flows state machine library [3].
[1] - https://docs.pipecat.ai/guides/telephony/twilio-websockets [2] - https://github.com/pipecat-ai/pipecat-flows/ [3] - https://github.com/pipecat-ai/smart-turn
Disclaimer: I spend a lot of my time working on Pipecat. Also writing about both voice AI in general and Pipecat in particular. For example: https://voiceaiandvoiceagents.com/
In your opinion, how close is Pipecat + OSS to replacing proprietary infra from Vapi, Retell, Sierra, etc?
Ps did you write this web guide?
The integrated developer experience is much better on Vapi, etc.
The goal of the Pipecat project is to provide state of the art building blocks if you want to control every part of the multimodal, realtime agent processing flow and tech stack. There are thousands of companies with Pipecat voice agents deployed at scale in production, including some of the world's largest e-commerce, financial services, and healthtech companies. The Smart Turn model benchmarks better than any of the proprietary turn detection models. Companies like Modal have great info about how to build agents with sub-second voice-to-voice latency.[1] Most of the next-generation video avatar companies are building on Pipecat.[2] NVIDIA built the ACE Controller robot operating system on Pipecat.[3]
[1] https://modal.com/blog/low-latency-voice-bot - [2] https://lemonslice.com/ = [3] https://github.com/NVIDIA/ace-controller/
I just want to provide: - business logic - tools - configuration metadata (e.g. which voice to use)
I don't like Vapi due to 1) extensive GUI driven experience, 2) cost
Or PipeCat Cloud / LiveKit cloud (I think they charge 1 cent per minute?)
That’s why I created a stack entirely in Cloudflare workers and durable objects in JavaScript.
Providers like AssemblyAI and Deepgram now integrate VAD in their realtime API so our voice AI only need networking (no CPU anymore).
e.g. Deepgram (STT) via websocket -> DO -> LLM API -> TTS?
Same with TTS: some like Deepgram and ElevenLabs let you stream the LLM text (or chunks per sentence) over their websocket API, making your Voice AI bot really really low latency.
Runs at around 50 cents per hour using AssemblyAI or Deepgram as the STT, Gemini Flash as LLM and InWorld.ai as the TTS (for me it’s on par with ElevenLabs and super fast)
OpenAI realtime voices are really bad though, so you can also configure your session to accept AUDIO and output TEXT, and then use any TTS provider (like ElevenLabs or InWord.ai, my favorite for cost) so generate the audio.
Is that really where SOTA is right now?
500-1000ms is borderline acceptable.
Sub-300ms is closer to SOTA.
2000ms or more means people will hang up.
ChatGPT app has a audio version of the spinner icon when you ask it a question and it needs a second before answering.
play "ehh".wavI attended VAPI Con earlier this year, and a lot of the discussion centered on how interruptions and turn detection are the next frontier in making voice agents smoother conversationalists. Knowing when to speak is a hard problem even for humans, but when you listen to a lot of voice agent calls, the friction point right now tends to be either interrupting too often or waiting too long to respond.
The major players are clearly working on this. Deepgram announced a new SOTA (Flux) for turn detection at the conference. Feels like an area where we'll see even more progress in the next year.
How does their golden nature not desuade these concerns for you?
If you really believe that the support can be good, then use a robotic text to speech, don't pretend it's a human. And make it clear to users that they are talking to a human, phone is a protocol that has the semantic that you speak to a human. Use something else.
The bottom line is that you have clients that registered under the belief that they could call a phone number and speak to a human, businesses are performing a short-term switcheroo at the expense of their clients, it's a scam.
Not really. The expectation is to be able to express their need in a natural language, maybe because their issue is not covered by a fixed-form web form (pun not intended).
So yeah AI might be a good fit in that scenario.
If that isn't the channel to speak to a human, nothing is. You can speak to a bot with an app or whatever.
At least make it sound robotty instead of pretending to be a human.
I wonder what Amazon's goals are, as an example. Currently, at least on the .ca website, there is no way to even get to chat to fix problems. All their spider text of help options, now always lead back to the return page.
So it's call them (which you can only find the number via Google.)
I suspect they're so disfunctional, that they don't understand why the massive uptick in calls, so then they slap AI in via phone too.
And so now that's slow and AI drivel. I guess soon I'll just have to do chargebacks!? Eg, if a package is missing or whatever.
Granted, it's been 1-2 weeks since I had an issue, so it may have changed since then, or it could be only released to a subset of users.
For narrow use cases like this I personally don't mind these tools.
actually, as someone who works in this area - no, it's not. it designed to help people to do things and metrics of success are closely monitored
For example, even with the (digital-only) SAAS company I work at, we have a non-trivial amount of customers who with strong preferences to talk on the phone, ex to provide their credit card number, rather than enter it in the product. This is likely more pronounced if your product serves less tech-savvy niches.
That said, a strong preference for human call > website use doesn't necessarily imply even a weak preference for AI call > website use (likely customer-dependent, but I'd be surprised if the number with that preference was exactly 0)
You and I certainly do, but a ton of people prefer calling on the phone.
Seriously what is the point of all this.
Situational context matters though, sometimes you get in the vehicle and get the alert. Just say "Hey Siri, call dealership" and away you go hands free. No messing with apps.
spammers, scammers and horrible customer support lines.
Even if the focus is now on hosted telephony, my experience is that everywhere you can hear the default nusic-on-hold
exten => s,n,Set(VM_UNIQUEID=${UNIQUEID}) exten => s,n,VoiceMail(${EXTEN}@default)
If you are using AGI or ARI, you can log it somewhere useful so you can correlate later.
If you are using a more vanilla configuration I’d say use the voicemail metadata .txt file that will be in the same folder as the recording to get info to find the CDR. It has things like callerid, origmailbox, origdate (or maybe it’s origtime), and duration. origmailbox should match the CDR destination and the orig time should also match. Haven’t done this specifically. But, I’m hoping I’m pointing you in the right direction.
I work with Freeswitch almost exclusively these days. But, my first experience with VoIP was Asterisk and a huge Perl AGI file keeping everyone talking to each other. Those were good time!
That's the first thing that I noticed too.
It's gotten to the point that my body subconsciously rejects bullet lists and headings that start with emojis.
Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.