voice-agentsapiconversational-ai Featured Status: active

Vapi

Developer-focused voice AI platform for building production-grade voice agents via API

Vapi is a developer-first API platform for building production-grade voice agents. You bring your own LLM, your own STT provider, and your own TTS provider, then Vapi handles the real-time orchestration layer that makes them talk to each other with low latency. Pay-As-You-Go at around $0.05 per minute. New accounts start with $200 in free credits. Used by YC startups and mid-market companies that want control over every layer of their voice stack.

Vapi launched in 2022 with a clear bet: developers building voice agents don't want an opinionated black box. They want infrastructure that handles the hard real-time coordination work while letting them choose every other layer themselves. Two years in, that bet looks correct. The platform sits at the center of a growing ecosystem of YC-backed companies and enterprise teams who treat voice as core product infrastructure rather than an afterthought.

The right way to think about Vapi is as a coordination layer, not a complete product. It connects a speech-to-text provider, a language model, and a text-to-speech provider into a pipeline that can hold a real-time phone conversation, handle interruptions, manage turn-taking, call external APIs mid-conversation, and stream audio back to the caller with sub-500ms latency on well-configured setups. What Vapi doesn't do is make decisions about which providers you should use. That modularity is the point.

Quick verdict

If you're a developer who wants control over the full voice stack and is willing to do the component selection work, Vapi is probably where you should start. The $200 in free credits gives you real runway. The TypeScript SDK is well-documented. The Pay-As-You-Go pricing at $0.05 per minute is the lowest entry point in the category. The trade-off is that "modularity" in practice means you're managing three provider relationships simultaneously and debugging issues that might originate in any one of them.

What Vapi actually does

The real-time voice pipeline problem is harder than it sounds. When a user speaks, you need to detect the end of their utterance, transcribe what they said, pass it to a language model, get a response, convert that response to audio, and start streaming the audio back before the user starts to wonder if the call dropped. That whole sequence needs to complete in under a second for the conversation to feel natural. Building that coordination layer from scratch, with proper streaming, turn-taking logic, and interruption handling, is weeks of engineering work even for an experienced team.

Vapi solves that specific problem. You configure which STT provider to use (Deepgram is popular for latency, AssemblyAI for accuracy on noisy audio), which LLM to call (OpenAI GPT-4o for quality, Groq's Llama inference for speed), and which TTS provider to use for voice output (ElevenLabs for quality, Play.ht for variety). Vapi handles the routing, the streaming, the phone infrastructure, and the webhook delivery.

The result is that a developer can go from zero to a working inbound phone agent in a few hours rather than a few weeks.

The provider modularity argument

Vapi's architecture means your total cost per conversation minute includes three separate bills: Vapi's own $0.05/min, plus your STT provider's per-minute rate, plus your LLM's token cost, plus your TTS provider's character cost. This sounds complicated, but it has a real advantage: you can optimize each layer independently.

If you're building an application where voice naturalness matters, you use ElevenLabs for TTS and accept the higher character cost. If you're building an internal tool where speed matters more than voice quality, you use a faster, cheaper TTS provider. If you have existing OpenAI credits, you point Vapi at GPT-4o. If you need faster inference for better conversation feel, you switch the LLM to Groq without changing anything else.

That granularity of control is rare in the voice agent space. Most competitors bake in their choices and charge a single per-minute rate that bundles everything. Vapi's approach serves developers who have already formed opinions about which AI providers they trust.

Developer experience

The TypeScript server SDK is the primary integration path. The documentation is reasonably thorough, with working examples for the common patterns: inbound call handling, outbound call triggering, function calling during a conversation, and webhook processing. The community Discord is active enough that most common questions have been asked and answered.

The web dashboard handles phone number provisioning, call logs, and basic monitoring. In early 2026 it's functional but not sophisticated. You can see what happened on a call, pull the transcript, and check the recording. You can't yet do complex analytics or A/B testing of prompts directly from the dashboard without external tooling. Teams building at scale tend to funnel call data into their own analytics infrastructure via webhooks rather than relying on what the dashboard surfaces.

Function calling during conversations is one of the more powerful features and worth understanding. You can define tools that Vapi will call when the language model decides it needs external data: check a user's account status, look up appointment availability, submit a form. The tool call happens mid-conversation, the result comes back to the LLM context, and the conversation continues. This is how voice agents go from "talking FAQ bot" to "agent that actually does things."

Pricing in practice

At $0.05 per minute from Vapi, plus typical component costs, a realistic all-in cost for a well-configured agent using ElevenLabs TTS and a capable LLM comes to $0.15 to $0.25 per minute. That's competitive with Retell AI's $0.07/min base rate once you account for the fact that Retell's rate includes their proprietary speech layers.

The $200 in free credits is meaningful. At $0.05/min Vapi charges plus reasonable component costs, you can run hundreds of test conversations before spending your own money. That's a real evaluation budget, not a token gesture.

For teams running significant call volume, the enterprise pricing conversation is worth having. Vapi has custom arrangements for teams with predictable volume, and the Pay-As-You-Go rate improves at scale in most cases.

Where Vapi fits and where it doesn't

Vapi is a strong fit for development teams building voice as a product feature. If you have engineers who can work with APIs, you want control over your speech and language model choices, and you're building something where the voice experience matters to your users, Vapi gets you there faster than building the orchestration layer yourself.

It's a weaker fit for non-technical teams or businesses that want to configure a voice agent through a UI rather than code. There's no meaningful no-code interface. You need to be comfortable with API configuration, webhooks, and provider management to use it effectively. For teams in that position, Synthflow and similar no-code platforms are more appropriate starting points.

It's also a weaker fit if you need enterprise-grade support SLAs from day one. The base tier's support response times are adequate for development but can be frustrating when debugging production issues under time pressure.

Vapi vs the field

Vapi vs Retell AI

Retell AI is the most direct comparable. Both are developer-focused, API-first platforms with similar use cases. Retell is more opinionated, bundles its own speech processing, and claims sub-800ms latency with emotion-adaptive dialogue that adjusts based on caller state. Vapi gives you more component flexibility. If you want a single vendor and a tuned out-of-the-box experience, Retell is worth evaluating alongside Vapi. If you have specific provider preferences or existing contracts, Vapi's modularity wins.

Vapi vs Bland AI

Bland AI focuses heavily on outbound calling infrastructure. It has phone number management and dialing automation baked in, which makes it a faster path to outbound call campaigns. Vapi can do outbound calling but it requires more configuration. If your primary use case is high-volume outbound dialing, Bland's specialized focus gives it an edge on that specific workflow.

Vapi vs ElevenLabs Conversational AI

ElevenLabs has its own Conversational AI platform that's an end-to-end solution with ElevenLabs' voice quality baked in. If voice naturalness is your primary concern and you're happy with a more integrated stack, ElevenLabs Conversational AI is worth comparing. Vapi lets you use ElevenLabs as the TTS layer while keeping flexibility elsewhere, so the two aren't mutually exclusive.

Getting started

The fastest path is: create an account at vapi.ai, claim the free credits, and work through the quickstart documentation for inbound call handling. That gets a functional agent on a real phone number in a few hours. The TypeScript SDK is the cleanest integration path for teams working in Node.js. The REST API works for any other language.

Before you pick your component providers, spend an hour with Deepgram's free tier to understand STT options, and run your intended script through ElevenLabs or Play.ht to hear the TTS output before you commit. Voice quality has a bigger impact on how users perceive your agent than almost any other variable, and making that choice early saves painful migration later.

The function calling system is where the real power is. Getting a basic agent running is a few hours of work. Getting an agent that calls your CRM, checks inventory, and routes edge cases to human agents is the interesting engineering problem, and Vapi's webhook architecture is designed for exactly that kind of integration.

For teams comparing the full voice agent landscape, the profiles on Retell AI, Bland AI, and Deepgram cover the closest adjacent tools in detail.

Key features

Real-time streaming voice with sub-500ms response latency on most configurations
Bring your own LLM: works with OpenAI, Anthropic, Groq, Together, and local models
Bring your own STT and TTS providers including Deepgram, ElevenLabs, and Play.ht
Phone number provisioning and outbound/inbound call management via API
Function calling and tool use for external integrations mid-conversation
Server-side webhooks for call events, transcripts, and custom business logic
Multi-language support via swappable STT and TTS providers
Call recording and real-time transcription with speaker diarization

Pros and cons

Pros

+ Full modularity: swap any LLM, STT, or TTS provider independently without changing the rest of your stack
+ Pay-As-You-Go pricing with no monthly minimums makes it accessible for early-stage projects
+ $200 in free credits gives real runway to build and test before spending money
+ Server-side webhooks give you clean hooks into business logic without awkward polling
+ Active developer community and TypeScript SDK make integration faster than building from scratch
+ Handles both inbound and outbound phone calls through the same API

Cons

− The modularity is also the complexity: you need to understand each provider's pricing separately
− No built-in voice quality guarantee since voice quality depends on which TTS provider you choose
− Dashboard and monitoring tooling is less mature than some enterprise-focused competitors
− Support response times on the base tier can be slow for debugging production issues

Who is Vapi for?

Customer support bots handling inbound calls with full CRM integration via function calling
Outbound appointment reminder and confirmation calls with natural conversation flow
Lead qualification calls that hand off to human agents when intent score crosses a threshold
Voice-driven internal tools where employees interact with business systems by phone

Alternatives to Vapi

If Vapi isn't quite the right fit, the closest alternatives are retell-ai , bland-ai , elevenlabs , and deepgram . See our full Vapi alternatives page for side-by-side comparisons.

Frequently Asked Questions

What is Vapi AI?

Vapi is an API platform for building voice agents. It provides the real-time orchestration layer that connects a language model, speech-to-text, and text-to-speech into a functioning voice agent capable of holding a phone conversation. You bring your preferred providers for each component; Vapi handles the timing, streaming, and call infrastructure. It's aimed at developers who want control over each part of the stack rather than an opinionated all-in-one solution.

How much does Vapi cost?

Vapi charges roughly $0.05 per minute of conversation on its Pay-As-You-Go plan. That's on top of whatever your chosen LLM, STT, and TTS providers charge, so actual cost per minute is higher depending on your component choices. New accounts receive $200 in free credits. Enterprise pricing is custom and available for teams with predictable high volume. There's no monthly subscription fee on the base plan.

How does Vapi compare to Retell AI?

Both are developer-focused voice agent platforms with similar API-first approaches. Vapi gives you more freedom to mix and match underlying providers, which is an advantage if you have strong preferences for specific models or if you're already paying for other providers. Retell AI offers a more opinionated stack with emotion-adaptive dialogue tuned out of the box, which can get you to a better-sounding result faster if you don't have specific provider preferences. Vapi's pricing at $0.05/min undercuts Retell's $0.07/min standard rate, though final cost depends on component choices for both.

Does Vapi support outbound calls?

Yes. Vapi handles both inbound and outbound phone calls. For outbound, you can provision phone numbers through the platform or bring your own via SIP trunking. Outbound calls can be triggered programmatically via API, which makes it straightforward to build automated dialing workflows, follow-up sequences, or batch outreach campaigns. The same webhook system that handles inbound call events works for outbound calls.

Which LLMs work with Vapi?

Vapi supports OpenAI models, Anthropic Claude, Groq's hosted inference, Together AI, and several other providers through its LLM provider configuration. You can also point it at a custom OpenAI-compatible endpoint, which means locally hosted models running on your own infrastructure work as well. The choice of LLM affects response quality and latency, with faster inference providers like Groq improving the overall conversation feel noticeably.

Related agents

Air AI

AI sales agent for extended outbound phone conversations up to 40 minutes focused on appointment setting

voice-agentssales From $99/mo

Anthropic Computer Use

Claude's computer-use capability that powers desktop and browser agents

Featured

autonomouscomputer-use Paid

AssemblyAI

Speech-to-text API and audio intelligence platform with LLM-powered analysis via LeMUR

speech-to-textaudio-intelligence Free tier

205 ★ — 0.5%