refact.aiAI tool

Refact AI

Website: https://refact.ai/

Visit website

refact.ai

IA per business e produttivita

Visit website

Pricing plans

Detailed pricing plans are not available yet for this tool.

Detailed overview

Enterprise Pricing Blog Docs Sign In Get Started Your Open-Source, Autonomous AI Agent Refact.ai codes like you, thinks like you, and adapts to your workflow instantly. Integrate it with your tools, fine-tune it to your codebase, and choose the best LLMs for your tasks. Deploy on-premise and stay in full control of your data. Personalized Open-source Self-hosted Start for free in your favorite IDE, or contact us for enterprise solutions. Start for Free Book a demo Code agent trusted by thousands of developers Get your AI Programming Partner that works for you Autonomous AI agent Delegate your coding tasks end-to-end Just describe what you need — Refact.ai Agent plans, executes, and deploys. It works like another developer in your IDE, integrating with your codebase and stack, while letting you preview and control the process. Completes task step-by-step with reasoning. Searches and analyzes repository for accurate execution. Connects with GitHub, databases, CI/CD pipelines, and more. Simple, intuitive UX. In-IDE chat Ask, edit, debug, and generate code in natural language Get accurate, context-aware chat answers and suggestions, tailored to your project. Fast and meeting your specific needs. Accurate autocompletions Use AI to continue your code in real-time As you write, Refact.ai predicts the next lines, functions, or classes with precision. Powered by Qwen2.5-Coder model and Retrieval-augmented generation (RAG), it analyzes every symbol you type, retrieves project-specific insights, and generates code for your next move. Get Refact.ai See how developers use AI Agent in real projects Vibe coding as it is: your high-level guidance, and AI handles the rest. Watch use cases of Refact.ai Agent autonomously solving developers' tasks while they barely touch the keyboard. Refact.ai Agent saved me thousands of euros I just finished a week of vibe coding with Refact.ai, and I'm still processing how much the product development game has changed. What I particularly love about Refact.ai is that it's fully integrated — it can see your entire code repository, integrates with multiple tools, and proposes changes based on the overall architecture and project knowledge. I was able to build a working product prototype within a week with just prompting and testing. Using Refact.ai saved me thousands of euros that I would have spent hiring a freelancer. It saved me months of work — I got it done in just a week! Denis Savin, LinkedIn 99.9% of an IoT cloud app built by Refact.ai Agent A full-fledged IoT cloud monitoring Django app for our IoT products was 99.9% programmed using only the Refact.ai Agent. I only worked on one thing — the models.py file. It was an amazing experience with the Refact.ai Agent, it truly saved me a lot of time and energy, which I could dedicate to my main job and family. The app includes white/dark mode, UI translation editing, and permission management for each module (like TRANS for translations), as well as automatic data updates through Ajax. Ukro, Discord Community 80 hours of building from scratch — instead done in 30 minutes The new programmer told me that he needed 80 hours to rewrite from scratch, as fixing it would take the same amount of time or more. So I connected Refact to the MySQL database and used the Chrome tool while logged into the WordPress admin. And guess what happened? In about 30 minutes of prompt ping-pong, it identified the issue with the plugin. I told the Agent I didn't want to update the plugin — just fix it! And it did! This freaking Agent keeps amazing me every single time. Ukro, Discord Community 3 weeks of waiting — solved in just 14 minutes! Another big win story here! I also work with other AI communities, and their workload is already full, so they don't have much time for improvements. Still, I just asked the Refact.ai Agent to build me a GUI for a worker client that this community is using. Let me tell you — in just 14 minutes, the agent created the most beautiful, fully functional GUI (nothing was missing) for this client. All I did was give it the link to the GitHub repo... (They had been asking for a GUI 3 weeks ago, a few people said 'OK', but nothing was delivered.) SuperMalinge, Discord Community Refact.ai Agent handles around 95% of the work As someone with zero prior web development experience, this Refact agent has been incredibly effective. With its intelligent support, I can now build functional web applications using a 'vibe coding' approach. I'd describe this agent as my personal paid developer — it handles around 95% of the work, including building and debugging my applications. Moreover, it helps me understand coding logic in the process, making it both a development and a learning tool. R3gzPro, Discord Community Refact.ai Agent saved me thousands of euros I just finished a week of vibe coding with Refact.ai, and I'm still processing how much the product development game has changed. What I particularly love about Refact.ai is that it's fully integrated — it can see your entire code repository, integrates with multiple tools, and proposes changes based on the overall architecture and project knowledge. I was able to build a working product prototype within a week with just prompting and testing. Using Refact.ai saved me thousands of euros that I would have spent hiring a freelancer. It saved me months of work — I got it done in just a week! Denis Savin, LinkedIn 99.9% of an IoT cloud app built by Refact.ai Agent A full-fledged IoT cloud monitoring Django app for our IoT products was 99.9% programmed using only the Refact.ai Agent. I only worked on one thing — the models.py file. It was an amazing experience with the Refact.ai Agent, it truly saved me a lot of time and energy, which I could dedicate to my main job and family. The app includes white/dark mode, UI translation editing, and permission management for each module (like TRANS for translations), as well as automatic data updates through Ajax. Ukro, Discord Community 80 hours of building from scratch — instead done in 30 minutes The new programmer told me that he needed 80 hours to rewrite from scratch, as fixing it would take the same amount of time or more. So I connected Refact to the MySQL database and used the Chrome tool while logged into the WordPress admin. And guess what happened? In about 30 minutes of prompt ping-pong, it identified the issue with the plugin. I told the Agent I didn't want to update the plugin — just fix it! And it did! This freaking Agent keeps amazing me every single time. Ukro, Discord Community 3 weeks of waiting — solved in just 14 minutes! Another big win story here! I also work with other AI communities, and their workload is already full, so they don't have much time for improvements. Still, I just asked the Refact.ai Agent to build me a GUI for a worker client that this community is using. Let me tell you — in just 14 minutes, the agent created the most beautiful, fully functional GUI (nothing was missing) for this client. All I did was give it the link to the GitHub repo... (They had been asking for a GUI 3 weeks ago, a few people said 'OK', but nothing was delivered.) SuperMalinge, Discord Community ‹ › With AI Agent, transform how you build software Understands your context Refact.ai analyzes your entire development environment to deliver accurate, context-aware code generation and chat responses—better than any other tested solution. Workspace Codebase Databases Files Documentation Web ... Learns and evolves with you The more you use it, the smarter it gets. Save use cases, refine memory, and train the Agent to adapt to your workflow. Make it your real digital twin. Remains in your control On-premise deployment keeps your code private and fully under your control. It provides maximum security and complete data ownership. Your browser does not support the video tag. Powerful AI tools for every development task AI Code Genarator AI Code Review Image to Code Python Code Generator Java Code Generator AI Code Generator Generate code for any programming language. This online tool easily generates, optimizes, and explains code in various languages. Try AI Code Generator AI Code Review Automate code reviews with AI. This tool analyzes code in multiple languages and provides clear fixes and improvements. Try AI Code Review Image to Code Convert images or screenshots into HTML/CSS code pages. Just upload your file, add details or requirements, choose a model, and get your code. Try Image to Code Python Code Generator Ask to generate, explain, or refactor your Python code. Simply select the action, describe the task, choose the model, and our online tool will do it for you. Try Python Code Generator Java Code Generator Request code generation, optimization, or refactoring for your Java projects. Just pick an action, define the task, select a model, and let our online tool handle the rest. Try Java Code Generator Supports 25+ programming languages Java Python JavaScript Rust PHP C++ TypeScript HTML React Ruby SQL C YAML CSS3 Get Refact.ai Build, customize, control Refact.ai gives you the flexibility and control to customize an AI Agent around your tools, workflows, and coding style. Connect tools Integrate with GitHub, PostgreSQL, Docker, and more. Refact.ai Agent accesses your resources and handles related operations autonomously, mimicking your workflow. Choose the best model Use Claude 4, GPT-4o, or GPT-4o mini with AI Agent or for chat queries. Bring your own key Connect your API key and use any LLM: Gemini, Grok, OpenAI, Deepseek, and others. On-premise option For maximum security, choose our self-hosted AI Agent version and run it on your own infrastructure. Try For Free Automate your company's software development Refact.ai Agent for Enterprise handles engineering tasks autonomously, enabling your team to deliver more, faster, and focus on high-impact work. Understands your company's context and standards by analysing your documentation and codebase. Learns from each interaction and feedback, becoming smarter over time. Organizes experience into the knowledge base for quick collaboration across your team. Deploy AI Agent on-premise, as SaaS, or on AWS. Refact.ai gives you the flexibility and control to customize an AI Agent around your tools, workflows, and coding style. Integrate with your environment AI connects to your GitHub, Docker, PostgreSQL, and more to mimic developer behavior and complete tasks. Fine-tune LLMs Train AI on your stack to customize its behavior. Keep data private On-premise deployment ensures your code never leaves your servers. Get priority support Dedicated assistance from our engineers at every stage. Empower your software development with AI Agent Learn how Refact.ai Agent can turn AI into a true force multiplier for your engineering teams. Book a Demo AI coding agent for your favourite IDE VS Code JetBrains Visual Studiobeta Neovimbeta Sublime Textbeta PyCharm WebStorm GoLand IntelliJ CLion Try Refact.ai now Free to start. Fully powered autonomous AI Agent for software development. Upgrade to Pro to lift the limits. Free For personal and hobby projects Access to Autonomous AI Agent (limited usage per day) In-IDE chat with 32k context Unlimited code completions powered by Qwen2.5-Coder Code-aware vector database (RAG) Pro From $10/month — for professional use Includes everything in the Free, plus: 40 requests/day to Autonomous AI Agent Unlimited in-IDE chat with 64k context Access to additional code completion models Enterprise Deploy AI Agent on-premise, as SaaS, or on AWS. LLM fine-tuning on your company's codebase Optimized for multiple GPUs Code privacy Priority support and onboarding Get Refact.ai Watch how AI Agent works ✕ Download Refact.ai plugins: VS Code JetBrains Visual Studiobeta Neovimbeta Sublime Textbeta PyCharm WebStorm GoLand IntelliJ CLion Autonomous Al Agent for Programming © 2025 Small Magellanic Cloud Ai Ltd. Product Enterprise Pricing Self hosted Company Blog About Documentation FAQ Resources Community Terms of Use Privacy Policy Cookies Policy We Respect Your PrivacyWe are using cookies to give you the best experience on our website.Allow allDenyPreferences --- Enterprise Pricing Blog Docs Sign In Get Started Pricing Free $0/month Start FREE All the Autonomous AI Agent capabilities 2,000 coins to use AI Agent & Chat In-IDE chat aware of your codebase context Claude 4, GPT 4.1, 4o, Gemini 2.5 pro, and more Unlimited fast auto-completion Codebase-aware vector database (RAG) Self-hosting option available Discord support Pro $10/month Start PRO 1x monthly limits 1x monthly limits 2x monthly limits 3x monthly limits 4x monthly limits 5x monthly limits Everything in Free, plus: 10,000 coins renewed every month Need more coins?Buy from $5 minimum at $1 = 1,000 coins. Thinking abilities Enterprise Private Server AWS Marketplace Installation On-premise installation As in the Pro plan, plus: LLM fine-tuning: Train AI models on your organization's codebase and data Optimized for multiple GPUs with load sharing Access control for detailed statistics On-prem or private cloud deployment Complete code privacy with zero telemetry leaving. Priority support × Contact us to get Refact.ai for your company Name * Email * Company * Number of developers * Select 0-10 10-100 100-1000 1000+ Interested in * Self-hosted SaaS How did you find out about Refact.ai? * Select I'm Refact.ai user Search Social Media Email Advertisement Conference Recommendation Message Yes, I'm interested in testing Refact.ai Agent for my company Send Book a Meeting Thank you! Will get back to you soon! Autonomous Al Agent for Programming © 2025 Small Magellanic Cloud Ai Ltd. Product Enterprise Pricing Self hosted Company Blog About Documentation FAQ Resources Community Terms of Use Privacy Policy Cookies Policy We Respect Your PrivacyWe are using cookies to give you the best experience on our website.Allow allDenyPreferences --- Enterprise Pricing Blog Docs Sign In Get Started About Us Picture this: You wake up, grab your coffee, and sit down at your computer. But instead of wrestling with complex code all day, you're collaborating with AI to solve problems. This isn't a distant dream - it's the world we're building right now at Refact.ai. Founded by a former OpenAI team member, Refact.ai is driven by a vision of building the future of programming. We believe that AI's potential shouldn't be monopolized by large corporations. Instead, we aim to harness the collective power of the internet, public data, crowdsourcing, and passionate individuals like you to develop open-source tools that empower millions of programmers worldwide. Meet the team Oleg Klimov Looney Tech Oleg Kiyashko Friendly Adviser Vlad Guber Financial Magic Sergey Vakhreev ML Engineer Dimitry Ageev ML Engineer Kirill Starkov ML Engineer Maksym Nevinchanyy DevOps genius Marc McIntosh Software Engineer Ilya Yarmalkevich Business Developer Katia Bystrakova PR/Marketing Katrin Maikova Marketing & Growth Autonomous Al Agent for Programming © 2025 Small Magellanic Cloud Ai Ltd. Product Enterprise Pricing Self hosted Company Blog About Documentation FAQ Resources Community Terms of Use Privacy Policy Cookies Policy We Respect Your PrivacyWe are using cookies to give you the best experience on our website.Allow allDenyPreferences --- Enterprise Pricing Blog Docs Sign In Get Started Refact.ai is now the #1 open-source AI Agent on SWE-bench May 15, 2025 by Sergey Vakhreev 5 min read product Our SWE-bench pipeline is open-source now — check it on GitHub. Refact.ai Agent achieved 70.4% on SWE-bench Verified — autonomously solving 352 out of 500 tasks. This makes Refact.ai a leading open-source AI programming Agent on SWE-bench and places it among the top ranks on the leaderboard. SWE-bench Verified is a refined version of the original SWE-bench, featuring 500 real-world GitHub issues, selected manually. It provides a more accurate and consistent way to evaluate how well AI agents can handle practical software engineering tasks. Key elements that made this possible: Extensive guardrails that step in when the model gets stuck or goes off trackdebug_script() sub-agent that uses pdb to fix bugs and can modify/create new scriptsstrategic_planning() tool powered by o3 to rethink and refine fixes when needed The full pipeline we used for SWE-bench Verified is open-source. You can implement the same components and run the benchmark just like we did — to reproduce Refact.ai Agent approach and score end-to-end. Read on to see how the Agent is built for SWE-bench, and how the same ideas power real-world workflows in Refact.ai. Model setup Orchestration model: Claude-3.7 Debug sub-agent — debug_script(): Claude-3.7 + o4-miniPlanning tool — strategic_planning(): o3pass@1: Each task is not attempted more than once.Temperature: 0 for every Claude model. For each SWE-bench Verified problem, Refact.ai Agent made one multi-step run aiming to produce a single, correct final solution. Our main goal was to achieve a maximum score in a single attempt. Simpler, more effective Agent prompt We revised the Agent prompt from our SWE-bench Lite run, where we top-ranked with a 59.7% score. Back then, it was more complex, and looking at how AI Agent behaved, we realized that simpler is better. The new version is shorter and easier to follow. Since Refact.ai is open-source, you can explore it: You are a fully autonomous agent for coding tasks. Your task is to identify and solve the problem from the given PR by directly changing files in the given project. You must follow the strategy, step by step in the given order without skipping. **Step 1: Explore the Problem** - Use `cat()` to open files. Use `search_symbol_definition()`, `search_symbol_usages()` if you know names of symbols. - Use `search_pattern()` for search by pattern, `search_semantic()` for a semantic search. **Step 2: Reproduce the Problem using `debug_script()`** - Find and run all project's existing tests to ensure the fix won't introduce new problems elsewhere. - Write a script that reproduces the issue. Cover as many corner cases as possible. - Set up necessary environment (e.g., create required folders or additional files) to run the script. - Run the script using `shell("python ...")` to verify that the error occurs and the script is correct. - After verifying that the script is correct and reproduces the issue, call `debug_script()` to debug it. **Step 3: Make a Plan using `strategic_planning()` and fix the Problem** - Open all new files mentioned in `debug_script()` report. - Call `strategic_planning()` once to think through and brainstorm the solution. - Update projects files directly without creating patches and diffs. **Step 4: Check and Improve Your Work by running tests** - Execute the script that reproduces the original issue. - Run project's existing tests again to ensure the fix doesn't introduce new problems elsewhere. **BEST PRACTICES** - You must follow the strategy (explore -> reproduce -> solve -> check), step by step in the given order. - Before each step explicitly announce your next actions. Make sure they are still align with the strategy. - Include your thoughts wrapped in before any action. - %CD_INSTRUCTIONS% Introducing a debugging sub-agent When developers run into bugs, they investigate the code to figure out what went wrong. For our SWE-bench Verified run, that role was mimicked by debug_script(). debug_script() is a sub-agent inside Refact.ai that uses pdb to debug, modify, and generate scripts. It helps AI Agent gather key issue details: Which files are affectedWhat actually caused the failureAnd how it might be fixed. Under the hood, debug_script() is powered by Claude-3.7, with o4-mini for summarizing debug info. We forced Refact.ai Agent to call this tool at least once — and up to three times — during each task. In practice, this debugging sub-agent was really helpful for digging into the problem source. Check out an example of its report (click to expand)1 - Problem Overview Issue summary An empty string in the Django `TEMPLATES` 'DIRS' setting can be converted into a root-directory path (`/`), causing the `template_changed()` autoreload hook to always fire (since `/` is in every file's `.parents`). Expected behaviour Empty or blank directory entries in `TEMPLATES['DIRS']` should be silently ignored (not converted into `Path("/")`) so that the autoreloader only triggers when actual template directories change. 2 - Project's Files & Symbols Observed File | Key symbols | Purpose / responsibility ------------------------------------------------------------- | ----------------------------------------------- | ------------------------------------------ /testbed/reproduce_autoreload_issue.py | empty_string, empty_path, template_changed, to_path, Path, test_file, get_template_directories | Test script to reproduce and inspect how empty strings in template dirs are handled …/django/utils/_os.py | to_path(value) | Utility that turns arbitrary values into `pathlib.Path` …/python3.X/pathlib.py | Path, Path.__new__, _from_parts, _parse_args, parse_parts, absolute(), resolve(), parents | Standard‐library path‐handling logic, shows how `Path("")` → `"."` → cwd …/django/template/autoreload.py | template_changed(sender, file_path), get_template_directories() | Hooks into file‐change events to decide when to reload templates 3 - Debugging Timeline 1. Invoked the reproduction script under `pdb`. 2. Stepped to `empty_path = to_path("")`. Confirmed `to_path` simply does `Path("")`. 3. Stepped through `pathlib.Path` internals: - `Path("")` is parsed as an empty part and becomes `Path(".")` (the current directory). - `empty_path.absolute()` is therefore the absolute path of the working directory (e.g. `/testbed`). 4. Checked membership: - `/testbed` was **not** in `/tmp/test_file.py`.parents → no reload. 5. Checked with true root Dir (`Path("/")`): - `/` **is** in every file's `.parents` → reload always fires. - Confirmed that the reported issue only occurs when the working directory equals `/`. 6. Inspected Django's `template_changed()` hook: ```python for template_dir in get_template_directories(): if template_dir in file_path.parents: reset_loaders() return True Examined get_template_directories(): it takes each entry of TEMPLATES['DIRS'] (including empty strings), runs it through to_path(), and collects Path objects without filtering out empties. Simulated a typical settings entry DIRS = ['', '/some/path']: Path('') → cwd If the Django server is launched from /, cwd=/ → template_dirs includes / → triggers reload on all file changes. Root cause: an empty string ends up as the process’s cwd; if that cwd is /, it is treated as a Template directory and thus matches every changed file. 4 - Lessons & Recommendations Pitfalls / anti-patterns • Converting unchecked user input (blank strings) directly to Path without validation. • Relying on current working directory semantics leads to environment‐dependent bugs.Codebase insights • get_template_directories() does not sanitize or skip empty/None entries. • No existing test covers the case of blank or empty template dir entries.Suggested fix / next steps • In get_template_directories(), filter out empty or purely whitespace strings before conversion:raw_dirs = settings.TEMPLATES[…]['DIRS'] cleaned = [d for d in raw_dirs if is instance(d, str) and d.strip()] template_dirs = [to_path(d) for d in cleaned]• Add a unit test for TEMPLATES['DIRS'] = ["", "/valid"] to ensure blank entries are ignored. • Document that blank entries are dropped and encourage users to validate environment variables before splitting.``` Guardrails to keep AI Agent on track The more we tried to constrain the model, the more it resisted. Since the goal was to solve each task in one go, we needed ways to make Agent more reliable. We added automatic guardrails that kick in when the model gets stuck or makes mistakes. Essentially, these are helper messages, inserted into the chat mid-run as if from a simulated “user” — to nudge Agent back on track. It’s all automated: the script runs static checks on the main model’s (Claude-3.7) messages, and if it detects signs that something is going off track, it sends a message into the chat to help guide the model back in the right direction. These small actions make a big difference in stability. Extra prompts after sub-agent calls: After debug_script: 💿 Open all visited files using `cat(file1,file2,file3,…)`! After strategic_planning(): 💿 Now implement the solution above. Reminders: - Do not create documents, README.md, or other files which are non-related to fixing the problem. - Convert generated changes into the `update_textdoc()` or `create_textdoc()` tool calls. Do not creat patches (in diff format) or monkey-patches! - Change the project directly to fix the issue but do not modify existing tests. - Find and run all project’s existing tests to ensure the fix won’t introduce new problems elsewhere. - Create new test files only using `create_textdoc()`. Guardrails for Agent flow: 💿 Use `debug_script()` instead of `shell()`; dig deeper than previous attempts and set breakpoints inside the project. 💿 Do not call `debug_script()` more than three times. 💿 Call `strategic_planning()` before modifying the project. 💿 If you struggle to find the correct solution, consider using `debug_script()` or `strategic_planning()`. 💿 You cannot call {a_tool_name}\ while on the previous step—follow the strategy. Strategic planning The strategic_planning() tool comes in at Step 3 of the Agent prompt. It helps the model improve solution quality by reflecting on what went wrong — and what could be done better — based on the debug_script() report. It uses reasoning, powered by o3, and updates project files directly, without generating patches and diffs. For this tool, we enforce one call per task. Since the observation layer (search + pdb debug) was already quite efficient, strategy planning sometimes lagged. We tried the o4-mini and o3 models and found no obvious differences on a small subset of tasks. That said, both models were prone to overcomplicating tasks or not smart enough to identify the real root cause. Claude 3.7 might be a good candidate as a planning model in the future, given how well it did in other parts of the workflow. Improvements over the SWE-bench Lite strategy A 59.7% score on the SWE-bench Lite was a solid start. We shared the full technical breakdown in our earlier blog post — but even with a SOTA result, this run exposed a few weak spots. Before tackling SWE-bench Verifies, we prioritized addressing these issues found. Tools-related updates: Fixed a few tool-related issues, making the tools more tolerant of the model’s uncertainty when calling them. Renamed tools — the model often skipped some tools as their names were unclear. New names: definition() -> search\_symbol\_definition() references() -> search\_symbol\_usages() regex\_search() -> search\_pattern() search() -> search\_semantic() deep\_analysis() -> strategic\_planning() Fixed the AST mechanisms inside refact-lsp that prevented decorated symbols from being parsed. Resolved an issue where Agent didn’t wait for ast/vecdb to finish indexing the project. We now mark line numbers, which add extra stability with retrieval tools like cat, search, etc. Context-related updates: Reduced the strength of chat compression. Claude 3.7 often tried to cat files already in context; instead of blocking it (which caused loops), we now allow the model to receive them again. Encouraged the model to open whole files instead of many tiny cat calls to read a file line by line. When the model opens large files, it noticeably degrades as the context size grows quickly. We continue to adjust this balance. During the SWE-bench Verified run, all these improvements were implemented. What we tried that did not work Not every experiment make it to production. Here’s what we tested — and what we implemented: Didn’t workWhat works insteadA separate critique tool that allowed the model to assess its own changes.Turns out, the model does better when it just runs tests and decides the next steps based on results.A complex strategic_planning () tool flow with four steps: root-cause analysis → initial solution → critique → refined solution. It overcomplicated simple tasks and lowered success rates.Now, strategic_planning() only generates a solution — and this works better.Using a pdb() tool without a dedicated sub-agent. The Claude model preferred shell() over pdb(), so debugging rarely happened.Introducing the debug_script() sub-agent made it reliable.Running without sub-agents. As context grew, Claude 3.7 quickly became less accurate and stopped following instructions.Letting sub-agents do their job. From benchmark to real product What makes Refact.ai stand out isn’t just the % of solved benchmark tasks — it’s how our AI Agent gets there. Our goal isn’t to win all leaderboards just for the sake of it, but to build an approach that actually works for real-world programming. That’s why SWE-bench Verified is also a way to test and improve the actual engineering flow of our product. Many of the updates we made for the run (see: Tools-related updates, Context-related updates) are already shipping in Refact.ai. The guard mechanisms are another example: in the product, we already have these helper messages that AI Agent automatically sends itself after calling certain tools. Like with debug_script(): it gets the tool output, and also a static instruction to open all the related files mentioned. So, these guard mechanisms are already part of specific flows. And we’re planning more, incluiding chat-wide checks to spot earlier off-tracks and react to them. We’re also updating the AI Agent prompt used in Refact.ai for VS Code and JetBrains to improve product efficiency for our users. Notably, strategic_planning() isn’t (and won’t be) called by default in pluggin — it’s heavy on coins spent and not always necessary, since the main model is often enough to solve the task. That said, if you think your task needs deeper reasoning, you can still call it manually in chat with @. Just keep in mind it’s coin-expensive. Refact.ai Agent solved SWE-bech Verified fully autonomously — but in real-world use, of course, developers often want more control. That’s why Refact.ai offers flexible interaction with manual overrides: you can delegate tasks to AI Agent, while it lets you preview and guide the process. That reflects our philosophy: autonomous AI Agent for programming you can trust — and control when you need to. Final score Out of 500 tasks in SWE-bench Verified: 🥇 Solved: 352 (70,4% resolve rate)Not solved: 148 (29,6%). Evaluation results Total InstancesSolvedNot solvedSolved (%)Not solved (%)50035214870,4%29,6% Get Refact.ai Agent for your IDE Refact.ai is an autonomous AI Agent that automates programming tasks — helping developers and IT teams move faster: With Refact.ai in your IDE, you get: Real automation that boosts productivity by 10xSeamless integration with your codebase, workflow, and dev toolsA digital twin that handles your busywork and lets you focus on big things. Available to everyone: install Refact.ai for VS Code or JetBrains today and feel the real impact in your everyday programming. Autonomous Al Agent for Programming © 2025 Small Magellanic Cloud Ai Ltd. Product Enterprise Pricing Self hosted Company Blog About Documentation FAQ Resources Community Terms of Use Privacy Policy Cookies Policy We Respect Your PrivacyWe are using cookies to give you the best experience on our website.Allow allDenyPreferences

Same category tools

HubSpot Breeze Assistant

Cabina AI