boundaryml.com
Plans tarifaires
Aucun plan tarifaire detaille n'est encore disponible pour cet outil.
Presentation detaillee
TH JUL 31 @ 9 AM PTTry BAML onlineThe First Language for Building AgentsTypescript made JavaScript 10x more reliable.BAML makes your ai pipelines 10x more reliable.pythontypescriptrubygootheruv add baml-py && uv run baml-cli initTry BAML in your browserGet StartedWorks with every LLM providerAnd every languageBaaaaaaaaaamlBasically A Made-Up LanguageTrusted by developers atComplete Development WorkflowDiscover how BAML transforms AI development in four easy stepsDefine your prompts functionsYes, Cursor, Claude, already know BAML.Yes, we made a whole VSCode extension for BAML.Test your prompts functionsDo it in VSCode, or the editor of your choice. Or in CI/CD with baml-cli testCall your prompts functions from any programming language you lovebaml-cli generate converts BAML functions to native functions in Pythonmain.py — Python — Visual Studio Codemain.pymain.tsmain.rbmain.go123456from baml_client import b
result = b.AnalyzeCodebase("...")
print(result)
Deploy your AgentDo nothing special for BAML. Since BAML generates native code in your language of choice, you can use it in any way you want.Multi-Cloud Deployment0%AWS LambdaVercelGoogle CloudAzure FunctionsRailwayEmpower Your AI DevelopmentBuild AI applications with type safety, generate TypeScript types, and validate your schemas.resume.bamlclass Resume {
name string
title string
}
function ExtractResume(resume: string) -> Resumemain.pymain.tsmain.rbmain.goresume = b.ExtractResume(resume)print(resume.education)Property 'education' does not exist on type 'Resume'.Type-Safe AI InterfacesDefine AI interfaces with confidence. Write BAML schemas that generate TypeScript types automatically.Extract the person's name and job title from this resume text...Structured OutputsGet type-safe, validated responses from any LLM. Support for JSON, XML, YAML, and more output formats.✅ Testing ResumeParser...✅ Testing SentimentAnalyzer...✅ Testing CodeReviewer...✅ All Agents tested successfullyTest Your Agents in CI/CDTest your agents in CI/CD pipelines to ensure they are working as expected.10:0010:0110:0210:0310:0410:00 AMAutomatic Retry and FallbackAutomatically retry failed requests and provide fallback responses when errors occur.People love BAMLAnd so do agentsBAML is amazing. I've used it in Python and Typescript. It's a game changer.Adam GitzesAmazonJust set up baml for my project, 10/10 experience and much faster than langchain.Jason FanFinic.aiIt's amazing!! Was able to cut down my tokens and time-to-first-token significantly without compromising results.Ray del VecchioCerebral ValleyBAML is amazing. I've used it in Python and Typescript. It's a game changer.Adam GitzesAmazonJust set up baml for my project, 10/10 experience and much faster than langchain.Jason FanFinic.aiIt's amazing!! Was able to cut down my tokens and time-to-first-token significantly without compromising results.Ray del VecchioCerebral ValleyBAML is amazing. I've used it in Python and Typescript. It's a game changer.Adam GitzesAmazonJust set up baml for my project, 10/10 experience and much faster than langchain.Jason FanFinic.aiIt's amazing!! Was able to cut down my tokens and time-to-first-token significantly without compromising results.Ray del VecchioCerebral ValleyBAML is amazing. I've used it in Python and Typescript. It's a game changer.Adam GitzesAmazonJust set up baml for my project, 10/10 experience and much faster than langchain.Jason FanFinic.aiIt's amazing!! Was able to cut down my tokens and time-to-first-token significantly without compromising results.Ray del VecchioCerebral ValleyBAML is definitely a must have if you want any structured data from LLM; no more BS/long paragraphs describing what the output should be like, it just works!!!Hankel BaoColdreach.aiThe test case and playground is quite literally the BEST feature. It has improved the iteration speed and quality by an order of magnitude.Joseph TuteraDocucare AII really really like what Baml offers [...] I think it's a step-wise improvement over Marvin. Having complete control over the prompt WITH strong type guarantees is fantastic.I also think the dedicated testing playground is awesome.GabeZenfetchBAML is definitely a must have if you want any structured data from LLM; no more BS/long paragraphs describing what the output should be like, it just works!!!Hankel BaoColdreach.aiThe test case and playground is quite literally the BEST feature. It has improved the iteration speed and quality by an order of magnitude.Joseph TuteraDocucare AII really really like what Baml offers [...] I think it's a step-wise improvement over Marvin. Having complete control over the prompt WITH strong type guarantees is fantastic.I also think the dedicated testing playground is awesome.GabeZenfetchBAML is definitely a must have if you want any structured data from LLM; no more BS/long paragraphs describing what the output should be like, it just works!!!Hankel BaoColdreach.aiThe test case and playground is quite literally the BEST feature. It has improved the iteration speed and quality by an order of magnitude.Joseph TuteraDocucare AII really really like what Baml offers [...] I think it's a step-wise improvement over Marvin. Having complete control over the prompt WITH strong type guarantees is fantastic.I also think the dedicated testing playground is awesome.GabeZenfetchBAML is definitely a must have if you want any structured data from LLM; no more BS/long paragraphs describing what the output should be like, it just works!!!Hankel BaoColdreach.aiThe test case and playground is quite literally the BEST feature. It has improved the iteration speed and quality by an order of magnitude.Joseph TuteraDocucare AII really really like what Baml offers [...] I think it's a step-wise improvement over Marvin. Having complete control over the prompt WITH strong type guarantees is fantastic.I also think the dedicated testing playground is awesome.GabeZenfetchCode is hella clean now. Look at [the] folder structure, and each folder for a respective pipeline. Each file just a prompt. Clean, elegant, beautiful.Paulo RossiMagnaplayJust got the categorizer to work first try.Felt like landing a kickflipEitan BorgniaSquackCode is hella clean now. Look at [the] folder structure, and each folder for a respective pipeline. Each file just a prompt. Clean, elegant, beautiful.Paulo RossiMagnaplayJust got the categorizer to work first try.Felt like landing a kickflipEitan BorgniaSquackCode is hella clean now. Look at [the] folder structure, and each folder for a respective pipeline. Each file just a prompt. Clean, elegant, beautiful.Paulo RossiMagnaplayJust got the categorizer to work first try.Felt like landing a kickflipEitan BorgniaSquackCode is hella clean now. Look at [the] folder structure, and each folder for a respective pipeline. Each file just a prompt. Clean, elegant, beautiful.Paulo RossiMagnaplayJust got the categorizer to work first try.Felt like landing a kickflipEitan BorgniaSquackShare your storyBuild AI Applications with ConfidenceBook a meeting with usStart building type-safe AI applications in minutes
---
BAML ThoughtsInsights, tutorials, and updates from the BAML team. Stay ahead with the latest in AI development.SubscribeGet the latest posts delivered to your inboxBrowse by categoryAllAnnouncementsTutorialsResearchEngineeringLaunch WeekResearch9 min readPrompting vs JSON Mode vs Function Calling vs Constrained Generation vs SAPA technical explanation of every way to extract structed data from an LLMBy Vaibhav Guptaover 1 year agoRead MoreCompany12 min read1 co-founder, 5 years, 12 pivots, still not deadHow we navigated 12 pivots without hating each otherVaibhav Guptaabout 18 hours agoReadEngineering9 min readStructured Outputs Create False ConfidenceConstrained decoding seems like the greatest thing since sliced bread, but it often forces models to prioritize output conformance over output quality.Sam Lijin17 days agoReadEngineering2 min readLLMs do not understand numbersDon't ask it to add a confidence score. Don't add it to sum up items on a receipt. Don't ask it to confirm how many rows there are in a PDF.Sam Lijinabout 1 month agoReadEngineering3 min readBeware When Using TOONTOON is a new serialization format, but has many pitfalls.Sam Lijinabout 1 month agoReadEngineering4 min readA cautionary tale on vibesPost-mortem of the 0.212.0 timeouts incidentGreg Haleabout 2 months agoReadTutorials5 min readYou could invent the next coding agent!How BAML makes tools calls easy to integrate into appsGreg Hale2 months agoReadResearch5 min readUsing UUIDs in prompts is badGuidance on dealing with entity IDs in LLM functionsGreg Hale3 months agoReadEngineering1 min readAdvanced Prompting Workshop NotesNotes from our 2024 Prompting WorkshopVaibhav Gupta5 months agoReadEngineering8 min readHow to write a Zed extension for a made up languageExploring the fascinating world of Wasm, Zed extensions and LSPEgor Lukiyanov6 months agoReadEngineering2 min readThe curious case of environment variablesEnvironment variables in BAML are not as simple as they seem. It's tricky to pass and read them from the generated languages runtime. This post talks about how we solved this problem by lazily loading them!Rahul Tiwari7 months agoReadAnnouncements5 min readTech Preview: WorkflowsSpecify complex workflows directly in BAMLGreg Hale7 months agoReadResearch8 min readLambda the Ultimate AI AgentA new take on Agentic frameworksGreg Hale7 months agoReadTutorials12 min readTutorial - An Agentic AI App with StreamingAn end-to-end agentic chatbot tutorial for React devsGreg Hale8 months agoReadTutorials1 min readTool use with Llama API (and reasoning)How to do tool-calling or function-calling with Llama API (with reasoning)Vaibhav Gupta8 months agoReadTutorials1 min readStructured outputs with Llama 4How to do tool-calling or function-calling with Llama 4Vaibhav Gupta9 months agoReadTutorials1 min readStructured outputs with QwQ 32BHow to use QwQ 32B to do function calling or tool callsAaron Villalpando10 months agoReadTutorials1 min readFull stack BAML with React/Next.jsAuto generated React hooks for your BAML functionsChris Watts11 months agoReadTutorials1 min readStructured outputs with Gemini 2.0How to do tool-calling or function-calling with Gemini 2.0Aaron Villalpando11 months agoReadTutorials1 min readStructured outputs with o3-miniHow to do tool-calling or function-calling with o3-miniVaibhav Gupta11 months agoReadLaunch Week4 min readBAML Launch Week Day 5Roadmap to BAML 1.0Vaibhav Gupta11 months agoReadLaunch Week5 min readBAML Launch Week Day 4Semantic StreamingGreg Hale11 months agoReadLaunch Week3 min readBAML Launch Week Day 3Type SystemAntonio Sarosi11 months agoReadLaunch Week1 min readBAML Launch Week Day 2BAML ChatSam Lijin11 months agoReadLaunch Week2 min readBAML Launch Week Day 1VS Code LLM Playground 2.0Aaron Villalpando11 months agoReadResearch6 min readAI Agents Need a New SyntaxA proposal for a new way to build AI applications and agentsVaibhav Gupta11 months agoReadLaunch Week1 min readBAML Launch Week AnnouncementJoin us on January 27-31Vaibhav Gupta11 months agoReadTutorials1 min readStructured outputs with Deepseek R1How to do tool-calling or function-calling with Deepseek R1Vaibhav Gupta11 months agoReadAnnouncements1 min readCursor support for BAMLBAML is now supported in CursorSam Lijinabout 1 year agoReadTutorials1 min readStructured outputs with Open AI O1How to use Open AI O1 to do function calling or tool callsVaibhav Guptaabout 1 year agoReadAnnouncements1 min readA new trick for generating code in JSONBAML now supports parsing triple-backtick code blocks in LLM outputsSam Lijinabout 1 year agoReadAnnouncements2 min readAnnouncing LLM Eval Support for Python, Ruby, Typescript, Go, and more.Use BAML to evaluate your LLM applications regardless of the language you use to call themGreg Haleabout 1 year agoReadTutorials3 min readGenerating Structured Output from a Dynamic JSON schemaModify LLM response models at runtime.Aaron Villalpandoabout 1 year agoReadResearch7 min readEvery Way To Get Structured Output From LLMsA survey of every framework for extracting structured output from LLMs, and how they compare.Sam Lijinabout 1 year agoReadEngineering3 min readSemantic Streaming vs Token-based StreamingA new technique for streaming structured output from LLMsAaron Villalpandoabout 1 year agoReadAnnouncements2 min readBringing Structured Outputs and Schema-Aligned Parsing to Golang, Java, PHP, Ruby, Rust, and MoreBAML now integrates with OpenAPI, allowing you to call BAML functions from any language.Sam Lijinover 1 year agoReadTutorials2 min readStructured Output with OllamaGetting structured output out of Ollama, using novel parsing techniques.Sam Lijinover 1 year agoReadResearch3 min readBeating OpenAI's structured outputs on cost, accuracy and speed — An interactive deep-diveWe leveraged a novel technique, schema-aligned parsing, to achieve SOTA on BFCL with every LLM.Vaibhav Guptaover 1 year agoReadEngineering1 min readTransparency as a TenetExposing the inner workings of BAMLAnish Palakurthiover 1 year agoReadEngineering6 min readBuilding a New Programming Language in 2024, pt. 1An overview of the work that goes into building a new programming language.Sam Lijinover 1 year agoReadAnnouncements2 min readUse Audio with your LLMs!Capturing Non-Text Information and Richer Context with LLMsAnish Palakurthiover 1 year agoReadAnnouncements1 min readAnnouncing Gemini Support!Applying structure to Gemini output with BAMLAnish Palakurthiover 1 year agoReadTutorials4 min readBuilding RAG in Ruby, using BAML, with streaming!How to do RAG with Ruby streaming AI APIsSam Lijinover 1 year agoReadTutorials4 min readBuild RAG with citations in NextJS (with streaming!)How to do RAG with NextJS streaming AI APIsAaron Villalpandoover 1 year agoReadResearch8 min readYour prompts are using 4x more tokens than you needA deep-dive into how to use type-definitions instead of json schemas in prompt engineering to improve accuracy and reduce costsAaron Villalpandoover 1 year agoReadAnnouncements1 min readAnnouncing BAML - The typesafe interface to LLMs, with built-in testing, guardrails and observabilityBAML is a lightweight programming language to help perform structured prompting in a typesafe way.Vaibhav Guptaabout 2 years agoRead
---
🦄 ai that worksA weekly conversation about how we can all get the most juice out of todays models with @hellovai & @dexhorthyEvery Tuesday at 10 AM PST1 hour of live code, Q&A with some prepped content to help you take your AI app from a demo to production.Join the conversation📅Event Calendar💬Discord🚀GitHub📺YouTube#501 day agoMCP is Dead?MCP isn't dead...or is it? This week on the podcast, we'll dive into this debate. What is the state of MCP today?
Demo CodeCodeWatch#499 days agoPrompt Injections GuardrailsA major risk factor in agentic coding is Prompt Injections. Tool output, document retrieval, system prompts all get inputted into the LLM and are all at risk of prompt injections.
This week on the podcast, we're going to cover how to handle this risk. We will discuss how to protect system prompts, avoid hijacking, and implementing ethical guards
Demo CodeCodeWatch#4816 days agoClaude Agent Skills Deep DiveClaude Code has exploded in its abilities over the past 8 months, and it can be hard to keep up. Seemingly overnight, everyone is discussing claude's skills, commands, agents, and subagents, and a lot of the literature out there already assumes you know what these are. This week on the podcast, we're going to go over all of them. We will discuss what each one is, how and when to use it, what the benefits and drawbacks are, and how they fit into the broader context engineering picture.
Demo CodeCodeWatch#4723 days agoPII Redaction and Sensitive Data ScrubbingWhen building generative AI systems, one of the biggest risks companies face is the LLM accidentally exposing PII or PHI to an end user that isn't cleared to see it. This week on the podcast, we'll cover how to fix this problem. We'll discuss what prompting techniques you can use, and more importantly, we'll discuss how you can build evals to get comfortable with shipping these systems to users.
Demo CodeCodeWatch#4630 days agoNo Vibes Allowed FebruaryIn our February edition of our No Vibes Allowed series, we will be coding and shipping real features in our products using all of the concepts we cover on this podcast, including using advanced context engineering and backpressure. Join us to see how these concepts apply to real code and real products.
Demo CodeCodeWatch#45about 1 month agoAI Content Pipeline RevisitedWe have another meta episode this week! Several months ago, we did an episode back about automating the pipeline for generating the artifacts and content for this podcast. That pipeline became stale, and so we breathed some life back into it and we're going to discuss the different parts of that pipeline on the podcast.
This episode will discuss everything that goes into bringing you an episode. We'll discuss
- Details of the entire pipeline and tools we use to bring you each episode
- How to get AI to have the right tone in freeform generation and not sound like AI
- Browser agents
- Finding clippable content from the transcript
- Image generation
- How far should automation go?
Demo CodeCodeWatch#44about 1 month agoAgentic Backpressure Deep DiveIn our next installment of advanced coding agent workflows, we'll explore some alternatives to research for improving results from coding agents. Code and web research is great for understanding the current codebase and finding documentation, but neither of these things is as concrete, and can still lead to hallucinations or incorrect assumptions.
In this episode, we'll talk about learning tests and proof-driven-dev - writing small PoC programs and tests that lay the groundwork to confirm understanding of external systems, *before* you get deep into implementation.
This will extend our previous conversation about agentic backpressure and building deterministic feedback loops to help coding agents work more autonomously.
Demo CodeCodeWatch#43about 2 months agoPrompting Is Becoming a Product SurfacePrompting used to be an engineering problem. Write the right string, tweak it until the model behaves, ship it behind the scenes.
That breaks the moment real users show up. Customers don't think in prompts — they think in goals. They want to explain what they're trying to accomplish, not debug a magic sentence.
So prompting is moving into the product. Interfaces matter. Structure matters. Guardrails and feedback matter. The real work now isn't prompt cleverness — it's building systems that let people express intent in a way software can actually understand and trust.
Demo CodeCodeWatch#42about 2 months agoNo Vibes AllowedWe received great feedback from our previous live coding sessions, so this week we are bringing it back this week by live streaming while we add more features to BAML. We have discussed a lot of topics over the past several months, and we will be digging into the how to put many of these concepts into practice as we build out actual features in the product.
Demo CodeCodeWatch#412 months agoEmail is All You NeedEmail is about as adversarial as inputs get: malformed HTML, inconsistent templates, human writing, forwarded junk, zero standards. And yet entire business workflows depend on it.
This week we're digging into what it takes to build a real email workflow engine where LLMs aren't demos, but are part of production infrastructure.
We'll cover:
- Handling long-tail edge cases and weird inbox behavior
- Validating and correcting extractions before they break downstream systems
- Maintaining accuracy across thousands of formats and senders
Demo CodeCodeWatch#402 months agoApplying 12-Factor Principles to Coding Agent SDKsWe've done a lot of talking in the last few months about prompting coding agents and context engineering w/ markdown files, but today we'll talk about how to squeeze even more out of agents by using agent loops as smaller elements of a deterministic workflow.
In this session we'll cover:
- using the claude agent sdk to stitch together microagent workflows
- accumulating user rules across context windows
- json state and structured outputs with zod
- session continuation and forking vs. direct compaction
Demo CodeCodeWatch#393 months agoUnderstanding Latency in AI ApplicationsA deep dive into performance engineering for AI applications. We explore all the bottlenecks
in agent systems - from prompt caching and token optimization to semantic streaming and UI design.
Learn how to make your agents feel faster through strategic latency reduction and smart UX choices.
Demo CodeCodeWatch#383 months agoFounding Boundary: Vaibhav's JourneyEnd of year special part 2: Vaibhav shares his journey from building card games in 7th grade
to founding Boundary and creating BAML. From Microsoft to Google to 12 pivots as a YC founder,
hear the story behind the programming language for AI pipelines.
Demo CodeCodeWatch#373 months agoFounding HumanLayer: Dex's JourneyEnd of year special part 1: Dex shares his journey from physics undergrad with half a CS minor
to founding HumanLayer. From Sprout Social to Replicated to building AI agents for data warehouses,
hear how the path to founding a developer tools company is never a straight line.
Demo CodeCodeWatch#363 months agoBuilding a Prompt OptimizerWhat happens when models can write really good prompts? We dive deep into prompt optimization,
exploring JEPA (Genetic Pareto) algorithm, how it works under the hood, and whether you can
build your own optimizer. Live demo of a prompt optimizer built with BAML.
Demo CodeCodeWatch#354 months agoGit Worktrees for AI Coding AgentsSince ~ May 2025, there's been a ton of buzz around AI coding agents, parallelizing workflows,
and it's not stopping any time soon. On this episode we'll go deep on the tech that can help
you push the limits of these tools, including:
- Crash course on Git Worktrees
- File and Spec Management, tradeoffs in hardlinks vs symlinks
- tmux as a building block for collaborative agent workflows
Demo CodeCodeWatch#344 months agoMultimodal EvalsBuilding evals for multimodal AI - testing vision models, document understanding,
and image analysis with structured evaluation frameworks.
Demo CodeCodeWatch#334 months agoNo Vibes Allowed: Using CodeLayer to Build CodeLayerLive coding with CodeLayer, we'll use Research / Plan / Implement live
to ship 3 new features to CodeLayer.
Demo CodeCodeWatch#324 months agoBuilding an Animation PipelineWe do a lot of work with Excalidraw, and this session shows the AI-first workflow
for turning any sketch into a finished animation.
We'll blend Claude Code with custom TypeScript scripts, wire up interactive slash commands,
and add browser automation to existing OSS tools to export polished WebM assets.
Demo CodeCodeWatch#314 months agoDates, Times, and LLMsHow do you make an LLM amazing at dates? Relative dates, absolute dates, timezones, all that madness.
Let's talk dates, times, and all that goodness.
Demo CodeCodeWatch#305 months agoEvent-driven agentic loopsKey takeaway: treat agent interactions as an event log, not mutable state. Modeling user inputs, LLM chunks,
tool calls, interrupts, and UI actions as a single event stream lets you project state for the UI, agent loop,
and persistence without drift. We walk through effect-ts patterns for subscribing to the bus, deriving “current”
state via pure projections, and deciding when to persist or replay events—plus trade-offs for queuing, cancelation,
and tool orchestration in complex agent UX.
Demo CodeCodeWatch#295 months agoRalph Wiggum under the hood: Coding Agent Power ToolsWe've talked a lot about how to use context engineering to get more out of coding agents. In this episode,
we dive deep on the Ralph Wiggum technique and why this different approach can reshape your coding workflow.
We explore how Ralph handles greenfield work, refactors, and spec generation—surprise: it's all about
higher-quality context engineering.
Demo CodeCodeWatch#285 months agoAgentic RAG + Context EngineeringIn this conversation, Vaibhav Gupta and Dex explore the intricacies of building an Agentic Retrieval-Augmented Generation (RAG) system. They discuss the differences between traditional RAG and Agentic RAG, emphasizing the flexibility and decision-making capabilities of the latter. The conversation includes a live demo of a coding agent, insights into the coding architecture, challenges faced during tool implementation, and the iterative process of refining the system. They also touch on the integration of web search functionalities and the evaluation of tool effectiveness, providing a comprehensive overview of the development process and the underlying principles of Agentic RAG systems. In this conversation, Vaibhav Gupta and Dex discuss the intricacies of building dynamic AI systems, focusing on tool implementation, user interface optimization, and model performance. They explore the importance of reinforcement learning in training models, the challenges of debugging AI systems, and the significance of writing code to enhance understanding and efficiency in AI development. The dialogue emphasizes the balance between different AI approaches and the necessity of real use cases in building effective solutions.
Demo CodeCodeWatch#275 months agoNo Vibes Allowed - Live Coding with AI AgentsVaibhav Gupta and Dex demonstrate the power of AI-assisted coding by implementing a complex timeout feature for BAML (a programming language for AI applications) in a live coding session. Starting from a GitHub issue that had been open since March, they showcase a systematic workflow: specification refinement, codebase research, implementation planning, and phased execution. Using Claude and specialized coding agents, they navigate a 400,000+ line codebase, implementing timeout configurations for HTTP clients including connection timeouts, request timeouts, idle timeouts, and time-to-first-token for streaming responses. The session highlights key practices like context engineering, frequent plan validation, breaking complex features into testable phases, and the importance of reading AI-generated code. In under 3 hours of live coding, they achieve what would typically take 1-2 days of engineering time, successfully implementing parsing, validation, error handling, and Python integration tests.
Demo CodeCodeWatch#266 months agoAnthropic Post MortemIn this conversation, Vaibhav Gupta and Aaron discuss various aspects of AI model performance, focusing on the recent downtime experienced by Anthropic and the implications for AI systems. They explore the sensitivity of models to context windows, the challenges of output corruption, and the complexities of token selection mechanisms. The discussion also highlights the importance of debugging and observability in AI systems, as well as the role of user-friendly workflows and integrations in making AI accessible to non-technical users. The conversation concludes with thoughts on the future of AI development and the need for effective metrics to monitor product performance.
Demo CodeCodeWatch#256 months agoDynamic SchemasIn this episode, Dex and Vaibhav explore the concept of dynamic UIs and how to build systems that can adapt to unknown data structures. They discuss the importance of dynamic schema generation, meta programming with LLMs, and the potential for creating dynamic React components. The conversation also delves into the execution and rendering of these dynamic schemas, highlighting the challenges and opportunities in this evolving field. They conclude with thoughts on future directions and the importance of building robust workflows around schema management.
Demo CodeCodeWatch#246 months agoEvals for ClassificationIn this episode of AI That Works, hosts Vaibhav Gupta and Dex, along with guest Kevin Gregory, explore the intricacies of building AI systems that are ready for production. They discuss the concept of dynamic UIs, the challenges of large-scale classification, and the importance of user experience in AI applications. The conversation delves into the use of LLMs for enhancing classification systems, the evaluation and tuning of these systems, and the subjective nature of what constitutes a 'correct' classification. The episode emphasizes the need for engineers to focus on accuracy and user experience while navigating the complexities of AI engineering. The speakers also discuss model upgrades, user feedback, and the importance of building effective user interfaces, emphasizing iterative development and rapid prototyping for chatbot performance evaluation.
Demo CodeCodeWatch#236 months agoBash vs. MCP - token efficient coding agent toolingIn this conversation, Dex and Vaibhav delve into the intricacies of coding agents, focusing on the debate between using MCP (Model Control Protocol) and Bash for tool integration. They explore the importance of understanding context windows, token management, and the efficiency of using different tools. The discussion emphasizes the significance of naming conventions, dynamic context engineering, and the engineering efforts required to optimize performance. They also share real-world applications, best practices for using MCPs, and engage with the community through a Q&A session.
Demo CodeCodeWatch#227 months agoGenerative UIs and Structured StreamingWe'll explore hard problems in building rich UIs that rely on streaming data from LLMs. Specifically, we'll talk through techniques for rendering **STRUCTURED** outputs from LLMs, with real-world examples of how to handle partially-streamed outputs over incomplete JSON data. We'll explore advanced needs like * Fields that should be required for stream to start * Rendering React Components with partial data * Handling nullable fields vs. yet-to-be-streamed fields * Building high-quality User feedback * Handling errors mid-streamDemo CodeCodeWatch#217 months agoVoice Agents and Supervisor ThreadingExploring voice-based AI agents and supervisor threading patterns for managing complex conversational workflows.Demo CodeCodeWatch#207 months agoClaude for Non-Code TasksOn #17 we talked about advanced context engineering workflows for using Claude code to work in complex codebases. This week, we're gonna get a little weird with it, and show off a bunch of ways you can use Claude Code as a generic agent to handle non-coding tasks. We'll learn things like: Skipping the MCP and having claude write its own scripts to interact with external systems, Creating internal knowledge graphs with markdown files, How to blend agentic retrieval and search with deterministic context packingDemo CodeCodeWatch#197 months agoInterruptible AgentsAnyone can build a chatbot, but the user experience is what truly sets it apart. Can you cancel a message? Can you queue commands while it's busy? How finely can you steer the agent? We'll explore these questions and code a solution together.Demo CodeCodeWatch#188 months agoDecoding Context Engineering Lessons from ManusA few weeks ago, the Manus team published an excellent paper on context engineering. It covered KV Cache, Hot-swapping tools with custom samplers, and a ton of other cool techniques. On this week's episode, we'll dive deep on the manus Article and put some of the advice into practice, exploring how a deep understanding of models and inference can help you to get the most out of today's LLMs.Demo CodeCodeWatch#178 months agoContext Engineering for Coding AgentsBy popular demand, AI That Works #17 will dive deep on a new kind of context engineering: managing research, specs, and planning to get the most of coding agents and coding CLIs. You've heard people bragging about spending thousands/mo on Claude Code, maxing out Amp limits, and much more. Now Dex and Vaibhav are gonna share some tips and tricks for pushing AI coding tools to their absolute limits, while still shipping well-tested, bug-free code. This isn't vibe-coding, this is something completely different.Demo CodeCodeWatch#168 months agoEvaluating Prompts Across ModelsAI That Works #16 will be a super-practical deep dive into real-world examples and techniques for evaluating a single prompt against multiple models. While this is a commonly heralded use case for Evals, e.g. 'how do we know if the new model is better' / 'how do we know if the new model breaks anything', there's not a ton of practical examples out there for real-world use cases.Demo CodeCodeWatch#158 months agoPDFs, Multimodality, Vision ModelsDive deep into practical PDF processing techniques for AI applications. We'll explore how to extract, parse, and leverage PDF content effectively in your AI workflows, tackling common challenges like layout preservation, table extraction, and multi-modal content handling.Demo CodeCodeWatch#148 months agoImplementing Decaying-Resolution MemoryLast week on #13, we did a conceptual deep dive on context engineering and memory - this week, we're going to jump right into the weeds and implement a version of Decaying-Resolution Memory that you can pick up and apply to your AI Agents today. For this episode, you'll probably want to check out episode #13 in the session listing to get caught up on DRM and why its worth building from scratch.Demo CodeCodeWatch#139 months agoBuilding AI with Memory & ContextHow do we build agents that can remember past conversations and learn over time? We'll explore memory and context engineering techniques to create AI systems that maintain state across interactions.Demo CodeCodeWatch#129 months agoBoosting AI Output QualityThis week's session was a bit meta! We explored 'Boosting AI Output Quality' by building the very AI pipeline that generated this email from our Zoom recording. The real breakthrough: separating extraction from polishing for high-quality AI generation.Demo CodeCodeWatch#119 months agoBuilding an AI Content PipelineContent creation involves a lot of manual work - uploading videos, sending emails, and other follow-up tasks that are easy to drop. We'll build an agent that integrates YouTube, email, GitHub and human-in-the-loop to fully automate the AI that Works content pipeline, handling all the repetitive work while maintaining quality.Demo CodeCodeWatch#109 months agoEntity Resolution: Extraction, Deduping, and EnrichingDisambiguating many ways of naming the same thing (companies, skills, etc.) - from entity extraction to resolution to deduping. We'll explore breaking problems into extraction → resolution → enrichment stages, scaling with two-stage designs, and building async workflows with human-in-loop patterns for production entity resolution systems.Demo CodeCodeWatch#910 months agoCracking the Prompting InterviewReady to level up your prompting skills? Join us for a deep dive into advanced prompting techniques that separate good prompt engineers from great ones. We'll cover systematic prompt design, testing tools / inner loops, and tackle real-world prompting challenges. Perfect prep for becoming a more effective AI engineer.Demo CodeCodeWatch#810 months agoHumans as Tools: Async Agents and Durable ExecutionAgents are great, but for the most accuracy-sensitive scenarios, we some times want a human in the loop. Today we'll discuss techniques for how to make this possible. We'll dive deep into concepts from our 4/22 session on 12-factor agents and extend them to handle asynchronous operations where agents need to contact humans for help, feedback, or approvals across a variety of channels.Demo CodeCodeWatch#710 months ago12-factor agents: selecting from thousands of MCP toolsMCP is only as great as your ability to pick the right tools. We'll dive into showing how to leverage MCP servers and accurately use the right ones when only a few have actually relevant tools.Demo CodeCodeWatch#610 months agoPolicy to Prompt: Evaluating w/ the Enron Emails DatasetOne of the most common problems in AI engineering is looking at a set of policies/rules and evaluating evidence to determine if the rules were followed. In this session we'll explore turning policies into prompts and pipelines to evaluate which emails in the massive Enron email dataset violated SEC and Sarbanes-Oxley regulations.Demo CodeCodeWatch#511 months agoDesigning EvalsMinimalist and high-performance testing/evals for LLM applications. Stay tuned for our season 2 kickoff topic on testing and evaluation strategies.Demo CodeCodeWatch#411 months agoTwelve Factor AgentsLearn how to build production-ready AI agents using the twelve-factor methodology. We'll cover the core concepts and build a real agent from scratch.Demo CodeCodeWatch#311 months agoCode Generation with Small ModelsLarge models can do a lot, but so can small models. We'll discuss techniques for how to leverage extremely small models for generating diffs and making changes in complete codebases.Demo CodeCodeWatch#212 months agoReasoning Models vs Reasoning PromptsModels can reason but you can also reason within a prompt. Which technique wins out when and why? We'll find out by adding reasoning to an existing movie chat agent.Demo CodeCodeWatch#112 months agoLarge Scale ClassificationLLMs are great at classification from 5, 10, maybe even 50 categories. But how do we deal with situations when we have over 1000? Perhaps it's an ever changing list of categories?Demo CodeCodeWatchNever Miss an EpisodeJoin our weekly sessions and learn how to build AI that actually works in production.📅Subscribe on Event Calendar💬Subscribe on Discord🚀Subscribe on GitHub📺Subscribe on YouTube
---
Who are we?We hate the current DX of building agents. So we're building a whole new programming language!Yes, we're that crazy.(We literally use Notion to present slides)Vaibhav GuptaCEO & Co-founderLinkedInAaron VillalpandoCTO & Co-founderLinkedInSam LijinEngineerLinkedInAntonio SarosiEngineerLinkedInGreg HaleEngineerLinkedInChris WattsEngineerLinkedInAnishIntern S24LinkedInRahulIntern S25LinkedInEgorIntern S25LinkedInJoin the CommunityReady to build type-safe AI applications? Join thousands of developers who are already using BAML in production.Star on GitHubJoin Discord
Outils de la meme categorie


