livekit.comAI tool

Kitt

Site: https://livekit.com/blog/meet-kitt

Visitar site

livekit.com

Ferramenta de Integração de IA AI Platform

Visitar site

Planos de precos

Ainda nao ha planos de preco detalhados para esta ferramenta.

Visao detalhada

Skip to main contentPlaceholder text for banner height reservation on mobilelivekit/agents9.8Kagents9.8Klivekit17.8KContact salesStart buildingStart buildingStart buildingStart buildingContact salesProductsResourcesCompanyPricinglivekit/agents9.8KBlog/EngineeringMetadataDate04.12.2023AuthorsRUSS D'SATHÉO MONNOMReading time9 min readTagsENGINEERINGShare on XShare on LinkedInJarvis. Samantha. Joi. HAL. Science fiction has long dreamt of anthropomorphized AI. Between GPT, Claude, Bard and other LLMs it seems like we’re on the precipice of this becoming reality. While we’ve enjoyed exchanging texts with ChatGPT, the LiveKit team thought it would be more fun to see and speak with it. Conveniently, we work on a suite of developer tools for building real-time video and audio applications which made building this much easier. Meet KITT (Kaleidoscopic, Interconnected Talking Transformer): an AI that you or your group can have live conversations with. KITT can do a lot of neat things including: Answer questions like Siri, Alexa, or Google Assistant Take notes on or summarize what was discussed in a meeting Speak multiple languages and even act like a third-party translator The rest of this post will dive into how KITT works under the hood, but if you’re eager to jump straight to the code, that’s here: https://github.com/livekit-examples/kitt Building KITT We wanted to keep the client as “thin” as possible. Ideally, KITT, or any other bot built by another developer could hook into a LiveKit session and publish a video and/or audio track, analogous to a human user sharing their camera or microphone streams. KITT also needs to pull down audio streams from every user in the session in order to convert that speech to text and potentially dispatch a prompt to GPT. We used LiveKit’s Go SDK which packages in Pion, allowing us behave like a WebRTC client and join sessions from the backend. Overall, the application architecture looks like this: Whenever a new session starts and the first user joins, we use a webhook to have KITT join it too: 1func (s *LiveGPT) webhookHandler(w http.ResponseWriter, req *http.Request) {2 event, err := webhook.ReceiveWebhookEvent(req, s.keyProvider)3 if event.Event == webhook.EventParticipantJoined {4 // if the GPT participant is not connected, connect it5 p, err := ConnectGPTParticipant(s.config.LiveKit.Url, jwt, language, s.sttClient, s.ttsClient, s.gptClient)6 }7 ...8} Now that KITT is connected to the session, they need to subscribe to all audio tracks from each user: 1func (p *GPTParticipant) trackPublished(publication *lksdk.RemoteTrackPublication, rp *lksdk.RemoteParticipant) {2 if publication.Source() != livekit.TrackSource_MICROPHONE || ... {3 return4 }5 6 err := publication.SetSubscribed(true)7 ...8} With audio streaming in, this is where things get interesting… Optimizing for latency We wanted conversations with KITT to feel human, thus our primary concern before writing any code was latency. In particular, there needed to be as little delay as possible between when a user spoke and when KITT responded. Until someone (OpenAI?) builds an audio-to-audio model, there are three spots in our pipeline where latency might creep in: 1) speech-to-text (STT) 2) GPT 3) text-to-speech (TTS). STT We evaluated a few services for STT including Google Cloud, DeepGram, Web Speech, and Whisper. In service of making interactions with KITT feel more human, we were willing to sacrifice transcription accuracy for lower latency. DeepGram’s model seems to have good accuracy but in our trials was much slower than Google’s service. OpenAI’s cloud-hosted Whisper API doesn’t currently support streaming recognition so that was a non-starter: while GPT needs to be fed full-text prompts, capturing incremental transcriptions is faster than sending one long speech segment. If everyone was accessing KITT via Chrome, the Web Speech API would be a decent choice but it’s not standardized across browsers. Some browsers use an on-device model, which is faster than processing speech on the server, but accuracy suffers disproportionately. Client-side STT also goes against our modular design goal (i.e. fully server-side bots). Ultimately, Google Cloud’s STT was fast, accurate, and supported streaming recognition. When sending audio to the STT service, Google recommends a 100ms frame size for balanced latency and accuracy. In practice, we found that a 20ms (WebRTC’s default encoding) frame size was sufficient for our use case. You can check out our implementation here: https://github.com/livekit-examples/kitt/blob/main/lkgpt-service/pkg/service/transcriber.go GPT GPT takes in text prompts and spits out text responses, but each model has different attributes: We went with GPT-3.5 because it favors speed but we had to make some tweaks. Tokens outputted by GPT are streamed, and like STT, text-to-speech is also faster when performed incrementally versus generating speech from one large blob of text. However, unlike the STT phase, this is the final stage in our pipeline and we don’t have to block on the full TTS output; we can stream audio segments as they’re generated in real-time to each user. In order to avoid choppy or uneven speech, we chose to delimit audio segments by sentence: 1func (c *ChatStream) Recv() (string, error) {2 sb := strings.Builder{}3 for {4 response, err := c.stream.Recv()5 if err != nil {6 ...7 }8 9 delta := response.Choices[0].Delta.Content10 sb.WriteString(delta)11 12 if strings.HasSuffix(strings.TrimSpace(delta), ".") {13 return sb.String(), nil14 }15 }16} The problem with delimiting by sentence is the model we’re using isn’t very concise in its responses, which increases the TTS latency. To work around this, we prime GPT like so: 1stream, err := c.client.CreateChatCompletionStream(ctx, openai.ChatCompletionRequest{2 Model: openai.GPT3Dot5Turbo,3 Messages: []openai.ChatCompletionMessage{4 {5 Role: openai.ChatMessageRoleSystem,6 Content: "You are a voice assistant in a meeting named KITT, make concise/short answers. " ...,7 },8 ...9 },10 ...11}) There’s probably better prompts to elicit shorter sentences from GPT, but the above performs well in practice. From here, it’s relatively straightforward to run each sentence through Google’s TTS and transmit audio responses to each user in the session: 1go func() {2 ...3 resp, err := p.synthesizer.Synthesize(p.ctx, sentence, tmpLang)4 ...5 err = p.gptTrack.QueueReader(bytes.NewReader(resp.AudioContent))6 ...7}() The general principle we followed to optimize for latency was streamall the things: by minimizing the amount of time it takes to receive, process, and transmit data at each step, we were able to keep latency to a minimum and create a seamless conversational experience. Designing the client interface With a working backend for KITT which could plug into any LiveKit session, instead of building a completely new UI, we saved a lot of time by using LiveKit Meet for this project. Meet is a Zoom-inspired sample application we built to show developers how to use LiveKit and for internally dogfooding our infra. In this demo, when you start a meeting, KITT will automatically join it too. If it’s a 1:1 meeting (i.e. only one human user), KITT will assume anything you say is directed at them and respond appropriately. If there are multiple human users in the meeting, saying “KITT” or “Hey KITT” will let KITT know your subsequent prompt is intended for them — we also play a sound to let you know KITT's listening. KITT Activated Borrowing from assistant interfaces like Siri and Google Assistant, whenever you speak to KITT, we show a live transcription of your prompt which helps you understand the input KITT received and contextualize their response: For KITT’s visual identity, ultimately we’d like to dynamically generate video frames on the backend, but for this initial version we opted to build a client-side React component. KITT cycles through a few different states, depending on the state of the conversation and whether they're engaged: Both live transcription and KITT’s states are transmitted to each user using LiveKit’s data messages. Putting it all together With everything working end-to-end, the result is pretty magical. ✨ What's next? Below are some areas that we’d like to explore as improvements to KITT’s implementation — if you’re interested in contributing towards any of these, we accept PRs! 🙂 STT Google’s STT is fast and accurate, but there are two downsides: 1) it’s an external service call 2) it’s not cheap. We wanted to explore running the smallest (i.e. fastest) model of Whisper ourselves to see if there’s a significant reduction to latency. The limitation of Whisper is it doesn’t have as broad language support as Google. TTS Google’s TTS is also fast, but the voice can be a bit robotic-sounding. We explored Tortoise which sounds amazing, but it takes ~20s to generate a single sentence! It’s worth testing other real-time TTS models like Rime, or even ones supporting custom voices which would be fun for end-users to interact with. GPT-4 and other models GPT-4 is a more powerful model, capable of producing more humanlike responses at the cost of speed. It would be interesting to see if pre-prompting it (to be extra concise) could help reduce the increased latency. Additionally, there are really fast models like Claude, or possibly running LLaMA or Alpaca locally, which could potentially achieve even lower latency without a significant impact on response quality. More and better prompting In this demo, we’ve barely scratched the surface of what pre-prompting GPT can do. We added the ability for every user to specify their spoken language and we prepend the language code (e.g. en-US) to every user-initiated prompt. This allows KITT to respond to each user in the appropriate language and even act as a live translator between two users. For every user query, we also pass GPT the entire conversation history (e.g. russ: ...\ntheo: ...) so KITT can respond to or reference any prompts which rely on historical context. Some ideas for taking this further include framing the session to GPT as a meeting, including any notes from emails or calendar entries, and adding real-time events to the history like theo left the meeting at or live chat messages. Avatars Giving KITT or another AI an expressive (and perhaps more human) visual representation will completely change how it feels to interact with them. The tricky part on the backend is setting up a pipeline which can take in an audio stream, supports compositing/animation/effects, and outputs video frames. One option is to record a browser instance like we do with LiveKit Egress. Another possibility is to use Unity or Unreal. Video processing Right now we aren't doing anything with a user's video stream. While transcriptions help with accessibility, imagine running a separate model that performs ASL recognition! Other things like sentiment analysis or scene understanding could provide GPT additional context, too. Screen sharing Some GPT responses include multimedia like videos, images, or code snippets. A neat feature would be to initiate a screen share or add some type of canvas to LiveKit Meet that KITT can use to display these types of assets to users. It was insanely fun building KITT and the very first conversation with them gave us goosebumps. There’s definitely potential for a few standalone products to be built on top of this foundation. Consider this an open invitation to take our code and build something amazing with it. If you have any questions along the way, want to jam on ideas, or just share what you’ve built, hit us up in the LiveKit Community Slack! 🤖 --- Skip to main contentPlaceholder text for banner height reservation on mobilelivekit/agents9.8Kagents9.8Klivekit17.8KContact salesStart buildingStart buildingStart buildingStart buildingContact salesProductsResourcesCompanyPricinglivekit/agents9.8KPlans designed to scale with your projects From building your first AI voice or video agent to realtime applications with millions of users and everything in between.BuildEverything you need to start a project.Start for freeNo credit card required$0/moStart with:Agent deploymentAgent observabilityInference creditsGlobal edge networkTelephony (1 free number)Session metrics and analyticsCommunity supportShipFor shipping your project to real users.Start buildingSTARTING AT$50/moEverything in Build, plus:Team collaborationInstant rollback to a previous agent deploymentEmail supportScaleFor scaling applications and global reach.Start buildingSTARTING AT$500/moEverything in Ship, plus:Role-based accessMetrics export APIsRegion pinningSecurity reports / HIPAAInference discountsEnterpriseFor teams interested in the white-glove treatment.Contact salesCustomEverything in Scale, plus:Volume pricing, including inferenceShared Slack channelSSOSupport SLAPricing CalculatorEstimate costs forAI voice and video agents Preview the per-minute cost to run an agent on LiveKit Cloud. Our plans include monthly allotments for agent session minutes, inbound calling minutes (for US local phone numbers), and inference credits to call the most popular AI models.For detailed LLM, STT, and TTS model pricing, see Inference pricing.For detailed provider and model API support, see Documentation.How users connect to your agentPhone callWeb/mobileSelect a planBuild/ShipScaleAgent session $0.0100/minTelephony$0.0100/minWebRTC ConnectionConnection and data transfer--LLM(Large language model)Choose a model--STT(Speech-to-text)Choose a model--TTS(Text-to-speech)Choose a model--Observability$0.0100/minTotal estimated cost $0.0300/minBuild$0/moStart for freeShip$50/moStart buildingScale$500/moStart buildingEnterpriseCustomContact salesAI voice and video agentsDeploy and host agents on LiveKit Cloud infrastructure Agent session minutes Time spent in a session by an agent deployed on LiveKit Cloud1,000 minutes included5,000 minutes includedthen $0.01 per min50,000 minutes includedthen $0.01 per minCustomConcurrent agent sessions Number of concurrent sessions across all agents deployed on LiveKit Cloud520Up to 600Starts at 50, request more via dashboardCustomAgent deployments Number of agents deployed on LiveKit Cloud124CustomDeployment metrics View resource allocation, latency, errors, and process metricsCold start prevention Keep agents always-on for instant responses—Instant rollback Revert a deployment back to a previous version—Audio enhancement Improve STT accuracy and VAD precisionSpeaker isolation Elevate the foreground speaker while suppressing background voices with ai-coustics' Voice Focus model100 minutes included1,000 minutes includedthen $0.0012/min10,000 minutes includedthen $0.0012/minCustomConversational intelligence Built-in models for end-of-turn detection and interruption handlingLiveKit InferenceAccess LLM, STT, and TTS models with a single API key Inference pricingLiveKit Inference credits Call popular models with LiveKit's inference service$2.50 in credits~50 minutes, based on model prices$5 in credits~100 minutes, then billed based on model prices$50 in credits~1,000 minutes, then billed based on discounted model pricesCustomLiveKit Inference concurrency Number of concurrent sessions connected to LiveKit's inference service520Request more via dashboard50Request more via dashboardCustomAgent observabilityGather insights into your agent's behavior and performance Agent session recordings Download and play back audio from recorded agent sessions1,000 minutes included5,000 minutes includedthen $0.005 per min50,000 minutes includedthen $0.005 per minCustomAgent observability events Review turn-by-turn details for recorded agent sessions, including transcripts, trace spans, and logs100,000 entries included500,000 entries includedthen $0.00003 per entry5,000,000 entries includedthen $0.00003 per entryCustomExport to cloud storage Send session recordings, transcripts, traces, and logs to cloud storage—Coming soonComing soonComing soonTelephonyConnect with your users over regular phone calls US local phone numbers Monthly rental of a US local phone number1 free number1 free numberthen $1.00/month per number1 free numberthen $1.00/month per numberCustomUS local inbound minutes Inbound minutes to a US local number50 minutes included100 minutes includedthen $0.01 per min1,000 minutes includedthen $0.01 per minCustomUS toll-free phone numbers Monthly rental of a US toll-free phone number—$2.00/month per number$2.00/month per numberCustomUS toll-free inbound minutes Inbound minutes to a US toll-free number—$0.02 per minute$0.02 per minuteCustomThird-party SIP minutes Inbound and outbound minutes using a third-party SIP trunk1,000 minutes included5,000 minutes includedthen $0.004 per min50,000 minutes includedthen $0.003 per minCustomCustom SIP domains Use your own domain for inbound SIP endpoints instead of livekit.cloud———ParticipantsAllow end users to connect to realtime sessions WebRTC minutes Time an end user spends connected to our network via WebRTC 5,000 minutes included150,000 minutes includedthen $0.0005 per min1.5M minutes includedthen $0.0004 per minCustomConcurrent connections Number of end users and agents connected to our network1001,0005,000CustomMedia transportDeliver voice and video worldwide in under 250ms Uptime Global availability of LiveKit's realtime network99.99%99.99%99.99%99.99%Enhanced noise cancellation Automatic background noise reduction for voice streams with KrispDownstream data transfer Data transfer from our network to participants50GB included250GB includedthen $0.12 per GB3TB includedthen $0.10 per GBCustomStream importIngest media encoded in another format and deliver it as a realtime stream Transcode minutes Time spent converting media from source format to RTP.Transcode-less imports (e.g., WHIP without transcode) are free.60 minutes included, shared with recording and export600 minutes included, shared with recording and exportthen $0.02 per minute (video) then $0.005 per minute (audio-only)8,000 minutes included, shared with recording and exportthen $0.015 per minute (video) then $0.004 per minute (audio-only)CustomConcurrent imports Number of Ingresses running concurrently2100500CustomRecording and exportCapture realtime media and encode it in another format for recording or multistreaming Transcode minutes Time spent running RoomComposite and Participant Egresses.Supports multiple export destinations per target format.60 minutes included, shared with stream import600 minutes included, shared with stream importthen $0.02 per minute (video) then $0.005 per minute (audio-only)8,000 minutes included, shared with stream importthen $0.015 per minute (video) then $0.004 per minute (audio-only)CustomTrack egress Raw single stream exports60 minutes included600 minutes includedthen $0.001 per minute8,000 minutes includedthen $0.001 per minuteCustomConcurrent exports Number of Egresses running concurrently2100500CustomPlatformBuild, ship, and manage your applications with additional tools and features Dashboard View sessions and detailed usage metrics, spin up sandboxes, and manage project settingsCLI Manage and interact with your application from the command lineTeam collaboration Build applications together with your team—Metrics export APIs Query and export analytics or telemetry data to external systems——Shared plan across projects Share a single plan across multiple projects for unified billing———Non-credit card billing Alternative payment options including invoicing and wire transfers———Security and complianceProtect your applications through access, application, and operational security End-to-end encryption Fully encrypt streams between sender and receiving clientsDPA Data processing addendumStandardStandardStandardCustomRole-based access Assign different roles and capabilities to project collaborators——Region pinning Restrict which data center regions may route or process media streams——Security reports Access third-party audit reports of our network infrastructure and operational security——Includes: SOC 2 Type II, Network pentestIncludes: SOC 2 Type II, Network pentestHIPAA compliance Signed BAA——Single sign-on (SSO) Authenticate with your enterprise identity———AWS Assume Role for S3 egress Use IAM roles with temporary credentials for S3 exports instead of long-lived access keys———SupportGet help and technical assistance for building your applications Community support Help and advice for building your application from the LiveKit communityEmail support Reach out to the LiveKit team for help via email—Shared Slack channel Build and collaborate with the LiveKit team in a private Slack channel———Designated solutions engineer A specific LiveKit team member focused on helping you build your application———Support SLA Escalation privileges and guaranteed response times for support tickets———FAQs What's the difference between agent deployments, concurrent agent sessions, and LiveKit Inference concurrency?An agent deployment is a running version of your agent backend hosted on LiveKit Cloud, typically with a unique prompt, set of voice AI models, and function calls. You can configure your agent to complete different tasks or workflows. Deploy separate agents when you need distinct reasoning behavior or tool access (e.g., a front-office receptionist agent to handle inbound phone calls for appointment scheduling and triage vs a back-office agent to make outbound calls to insurance providers to verify patient coverage).A concurrent agent session is a live interaction between your agent and an end user. If your agent is handling 10 calls or conversations at the same time, that counts as 10 concurrent sessions, regardless of how many agent deployments you have on LiveKit Cloud.LiveKit Inference concurrency refers specifically to how many AI inference requests across LLM, STT, and TTS can run at the same time through LiveKit Inference. It limits how many model calls can be processed concurrently, independent of how many agent sessions or deployments you have. The LiveKit Inference concurrency limit for each plan applies to your aggregate usage of a model type (e.g., total connections to any LiveKit Inference STT). For example, if there are 10 concurrent agent sessions running and the agent is configured to use LiveKit Inference for STT, then there are 10 concurrent STT connections.For more information on LiveKit Cloud quotas and limits, refer to our docs.Can I self-host LiveKit?The LiveKit Agents framework and LiveKit media server are both completely open source and available to run locally or host on your own infrastructure.LiveKit Cloud is the best way to run LiveKit in production, with fully managed agent deployments, built-in observability and dashboards, and ultra low-latency global media transport.Sign up for LiveKit Cloud here, or refer to our docs on how to run LiveKit's media server locally or deploy LiveKit Agents in a custom environment.Do you offer on-premise or private deployments?Yes. Contact sales so we can better understand your needs.Powering billions of calls in production for:Ready to build? Start building a voice AI agent with a free account. Reach out to us if you're interested in custom pricing.Start buildingStart buildingStart buildingContact salesNo credit card required • 1,000 free agent session minutes monthly --- Skip to main contentPlaceholder text for banner height reservation on mobilelivekit/agents9.8Kagents9.8Klivekit17.8KContact salesStart buildingStart buildingStart buildingStart buildingContact salesProductsResourcesCompanyPricinglivekit/agents9.8KWe're building the infrastructure for the voice-driven era of computing Forty years ago, Steve Jobs unveiled a computer that could speak to you. The idea of a computer that adapts to us is not new, but the technology that makes it possible is. We started LiveKit to bring that technology to every developer in the world.We began as an open source project for building livestreaming and video conferencing applications using WebRTC. Over time, we've evolved into a developer platform for building voice, video, and physical AI agents. What started with just a media server and some SDKs is now a full ecosystem of APIs and tools for multimodal computing.Over 200,000 developers and teams, ranging from leading AI and robotics labs to Fortune 500 companies, use LiveKit as the default infrastructure layer for building AI that can interact with the world in real time.We're hiringWe're hiringWe're hiringHere's to the crazy ones Some of the best investors, founders, and builders in the world share our belief in how we'll use computers to do more than ever before. We're grateful to have their support. Our InvestorsJeff DeanChief Scientist, GoogleElad GilCEO, Gil CapitalAravind SrinivasCEO, PerplexityAmjad MasadCEO, ReplitGuillermo RauchCEO, VercelLogan KilpatrickProduct Lead, Google DeepmindMati StaniszewskiCEO, ElevenLabsErik BernhardssonCEO, ModalRohan AnilResearcher, AnthropicGarrett CampFounder, UberEv WilliamsFounder, TwitterJustin KanFounder, Twitch --- Skip to main contentPlaceholder text for banner height reservation on mobilelivekit/agents9.8Kagents9.8Klivekit17.8KContact salesStart buildingStart buildingStart buildingStart buildingContact salesProductsResourcesCompanyPricinglivekit/agents9.8KPlatformBuild, run, and observe AI agents LiveKit is a developer platform for voice, video, and physical AI. Build agents with our open source SDKs, deploy them across a global network of data centers, and monitor them in production with realtime observability.Start buildingStart buildingStart buildingDocumentation300,000+ developersBillions of calls annually300+ AI model integrationsOur approachYour agent, your code. With LiveKit, you control how your AI agent sounds, how it behaves, and what actions it takes. When you’re ready to deploy to production, LiveKit Cloud takes care of the runtime infrastructure, from version control and dispatching to autoscaling and turn-by-turn telemetry.Built for the full agent development lifecycle Voice agents are fundamentally different from web and mobile apps. They listen, think, and respond in real time, maintaining context throughout the conversation. LiveKit's infrastructure is designed to facilitate multimodal, low-latency, high-volume conversations between agents and humans.agent.pyagent.ts1from dotenv import load_dotenv2 3from livekit import agents, rtc4from livekit.agents import AgentServer,AgentSession, Agent, room_io5from livekit.plugins import noise_cancellation, silero6from livekit.plugins.turn_detector.multilingual import MultilingualModel7 8load_dotenv(".env.local")9 10class Assistant(Agent):11 def __init__(self) -> None:12 super().__init__(13 instructions="""You are a helpful voice AI assistant.14 You eagerly assist users with their questions by providing information from your extensive knowledge.15 Your responses are concise, to the point, and without any complex formatting or punctuation including emojis, asterisks, or other symbols.16 You are curious, friendly, and have a sense of humor.""",17 )18 19server = AgentServer()20 21@server.rtc_session(agent_name="my-agent")22async def my_agent(ctx: agents.JobContext):23 session = AgentSession(24 stt="deepgram/nova-3:multi",25 llm="openai/gpt-4.1-mini",26 tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",27 vad=silero.VAD.load(),28 turn_detection=MultilingualModel(),29 )30 31 await session.start(32 room=ctx.room,33 agent=Assistant(),34 room_options=room_io.RoomOptions(35 audio_input=room_io.AudioInputOptions(36 noise_cancellation=lambda params: noise_cancellation.BVCTelephony() if params.participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP else noise_cancellation.BVC(),37 ),38 ),39 )40 41 await session.generate_reply(42 instructions="Greet the user and offer your assistance."43 )44 45 46if __name__ == "__main__":47 agents.cli.run_app(server)BuildBuild voice agents with open source SDKs and complete developer control. Write agent logic directly in code, test it against real scenarios, and iterate without vendor lock-in.Agent backendBuild agent logic in Python or TypeScript with server-side SDKs.Agent frontendCreate polished agent interfaces with customizable UI components.TestingMeasure and stress-test real scenarios before shipping to production.RunDeploy agents to a global network built for production-grade reliability and performance. LiveKit Cloud handles the realtime infrastructure, routing, and scaling so your team can focus on the product, not the ops.Agent deploymentShip agents to production on LiveKit Cloud with zero infrastructure management.InferenceAccess STT, LLM, and TTS models with optimized and low-latency routing.TelephonyConnect agents to phone systems with native SIP, no trunk configuration required.ObserveGain end-to-end visibility into every conversation. Track latency, quality, and outcomes in real time, then use those insights to continuously improve your agents.Agent insightsReplay sessions, review transcripts, and inspect trace spans.Agent logsMonitor performance and debug issues with runtime and build logs.Session reportsReview conversation history, events, metadata, and agent configuration details.Features Conversation handlingBuilt-in models for noise cancellation, end-of-turn detection, and interruption handlingNative client SDKsShip voice agents to any platform, including web, iOS, Android, and microcontrollersCoding agent resourcesTurn Claude Code, Cursor, Codex, or Gemini into a LiveKit expertGlobal cloud networkConnect to users with ultra-low latency on a distributed mesh of media serversElastic scalingServe thousands of concurrent voice agent sessions at any given timeSIP supportConnect agents to any phone number for inbound and outbound callingConversation qualityBuilt-in models for noise cancellation, turn detection, and interruption handlingRealtime metricsSubscribe to agent metric events or forward them to external servicesRecording exportsSend audio and video recordings directly to a storage providerGet started with LiveKit Cloud Build your first voice agent on LiveKit with a coding assistant, Agent Builder, or Agents SDKs.Start buildingStart buildingStart buildingEnterprise readyBuilt for production workloads at scale LiveKit Cloud is architected to meet the security, compliance, and operational requirements of enterprise teams, including end-to-end encryption, role-based access controls, and support plans.“LiveKit enables us to very granularly fine-tune our agent code for every workflow and use any model provider. No other platform gives us this level of control.”Jeffery Liu Founder & Co-CEO, Assort HealthContact salesResources to help you learn and ship Read the docsInstall coding assistant starter packguideStart building with our Voice AI quickstartguideWatch our LiveKit 101 workshop seriesvideoFAQs What is LiveKit Cloud?LiveKit Cloud is a distributed cloud platform for running voice, video, and physical AI agents. It includes a global mesh network for ultra low-latency media transport, fully managed agent deployments, native telephony support, and full-stack session and agent observability. To learn more, visit our docs.How do I get started with LiveKit Cloud?Sign up for a LiveKit Cloud account, use the LiveKit Agents SDKs or Agent Builder to build an agent, then deploy to LiveKit Cloud using the CLI or with a single click in the dashboard. To learn more, visit our docs.Is LiveKit secure and compliant?Yes, LiveKit Cloud is secure and compliant, with end-to-end encryption, strict security policies, identity and access management, and more. LiveKit Cloud complies with SOC 2 Type II, GDPR, CCPA, and HIPAA. To learn more, visit our Trust Center.Can I use my own AI models with LiveKit?Yes. LiveKit Inference supports over 50 AI models out-of-the-box, no API keys or additional configuration required. In addition, the LiveKit Agents framework supports another 200+ models via plugins, including an OpenAI plugin that supports most models compatible with the OpenAI API format.For models not currently available via LiveKit Inference or a plugin, the LiveKit Agents plugin framework is extensible and community-driven. We welcome contributions for new STT, LLM, and TTS plugins. To learn more about using AI models with LiveKit, visit our docs.Do I need to change my agent code to use LiveKit Cloud?No, all agents built with the LiveKit Agents SDKs can be deployed to LiveKit Cloud without any code changes. To learn more about deploying agents to LiveKit Cloud, visit our docs.Is there a free tier?Yes, the Build plan is completely free on LiveKit Cloud. It offers generous monthly quotas to get you started, including 1,000 agent session minutes, LiveKit Inference credits, and one free US local phone number for inbound calling. To learn more, visit our pricing page.Ask about LiveKit Platform

Ferramentas da mesma categoria