codingself-hosted Status: active

Refact.ai

Open-source AI coding assistant with self-hosted models, IDE plugins, and custom fine-tuning support

Refact.ai is an open-source AI coding assistant from SmallCloud that runs on your own GPU hardware or their cloud. It supports custom fine-tuning on your codebase, deploys to VS Code and JetBrains, and is the most practical self-hosted option for teams with strict data residency requirements.

Cloud-based AI coding tools send your code to someone else's servers. For a startup building a consumer app, that's an accepted trade-off for access to frontier model quality. For a bank, a defense contractor, a healthcare company, or any organization under GDPR, HIPAA, or similar frameworks with teeth, that's often not a trade-off that compliance teams will sign off on.

Refact.ai is SmallCloud's answer to that constraint. It's an open-source AI coding assistant that runs on your own hardware, supports custom fine-tuning on your codebase, and connects to VS Code and JetBrains without sending code outside your network. It launched in September 2023, has been actively maintained since, and in early 2026 is probably the most production-ready self-hosted option for teams that need real data residency guarantees alongside a coding assistant that actually works.

Quick verdict

If your team is cloud-blocked by compliance, or if you want the model to learn your internal coding patterns through fine-tuning, Refact.ai is the most practical option in this category. The self-hosted setup requires a GPU and about an hour of configuration. The completion quality on self-hosted open-weight models won't match Claude Sonnet or GPT-4o, but it's meaningfully better than nothing, and fine-tuning on your codebase closes some of the quality gap for common patterns. If cloud tools are an option for you, Codeium or GitHub Copilot will give you a better experience for less setup effort. Refact.ai's value proposition is specifically about the control it gives you over where inference happens.

What Refact.ai is

SmallCloud, a San Francisco-based company founded in 2022, built Refact.ai as an open-source alternative to cloud-dependent coding assistants. The GitHub repository is at github.com/smallcloudai/refact and is actively developed. The product installs as a plugin in VS Code or JetBrains and connects to either the Refact.ai cloud or a self-hosted Refact server.

The feature set covers the standard coding assistant ground: inline completions, a chat panel for questions and code explanations, function-level documentation generation, and code review feedback. What distinguishes Refact.ai from Codeium, GitHub Copilot, or Tabnine is the self-hosting capability and the fine-tuning pipeline. These are not marketing features; they require real infrastructure to use, and they're only valuable to a specific type of team.

The company has resisted the full agentic pivot that Claude Code, Cursor, and others have made. Refact.ai's agent features exist but are less developed. This is a trade-off in favor of quality and reliability on the core completion and chat experience rather than a rush to match agentic capabilities across every dimension.

Self-hosted deployment

This is Refact.ai's primary differentiator and the reason to choose it over cloud alternatives.

The Refact server is a Docker container. You run it on a machine with an NVIDIA GPU, point it at a model (from Hugging Face or a local path), and configure the IDE plugins to talk to your server's address rather than the Refact.ai cloud. Once configured, all inference runs locally. Code never leaves your network.

The minimum useful GPU is something with 8GB of VRAM (an RTX 3080 or equivalent) running a quantized 7B model. Better results come from 24GB+ VRAM running larger models at higher precision. The quality gap between a quantized 7B model and GPT-4o is real; it's most visible on complex reasoning tasks and unusual coding patterns. For common, well-represented patterns (standard Python, TypeScript React components, Go HTTP handlers), the gap is smaller because smaller models have been trained heavily on those patterns.

The setup process is documented in the Refact.ai repository. For a developer comfortable with Docker and GPU drivers, first-time setup takes 30 to 60 minutes. The main friction points are GPU driver configuration and model download times for larger models. The setup is not plug-and-play for non-technical stakeholders, which matters if the person deploying it isn't the same person who needs to use it.

For enterprise deployments, SmallCloud offers support contracts that include setup assistance and SLA guarantees. Contact them for pricing; enterprise plans are custom.

Fine-tuning on your codebase

This is the feature that moves Refact.ai from "private Copilot with worse models" to something qualitatively different.

Fine-tuning takes a base model and continues training it on your codebase. The result is a model that knows your internal APIs, your naming conventions, your library choices, and your architectural patterns. When you start typing a function that follows your internal style, the fine-tuned model is more likely to complete it correctly than a base model that's never seen your code.

The practical workflow in Refact.ai: you select a portion of your codebase as training data (typically your most representative, highest-quality code), trigger a fine-tuning run through the Refact UI, and deploy the resulting model on your server. Fine-tuning on a 7B model takes a few hours on a GPU; the UI shows progress. You can A/B test fine-tuned versus base models and roll back if quality regresses.

The quality improvement from fine-tuning is most pronounced for teams with a lot of proprietary internal libraries or unusual conventions. If your codebase uses a homegrown ORM, internal authentication middleware, or a bespoke test framework, a base model will make assumptions based on popular open-source alternatives. A fine-tuned model knows your actual APIs.

This is not a feature for individual developers. The compute cost of fine-tuning, the need for a meaningful training corpus, and the operational work of deploying and maintaining fine-tuned models makes this a team or organization-level investment.

IDE integration

Refact.ai ships plugins for VS Code and JetBrains. Both are available in their respective marketplaces and connect to whichever endpoint you configure: the Refact.ai cloud or your self-hosted server.

The VS Code plugin provides completions inline as you type, a chat panel that knows your current file's context, and commands for explaining code, generating documentation, and reviewing functions. The JetBrains plugin covers the same features across IntelliJ IDEA, PyCharm, GoLand, WebStorm, and other JetBrains IDEs.

The JetBrains parity is worth calling out specifically. Many enterprise engineering teams are on JetBrains IDEs, and not every cloud coding assistant prioritizes JetBrains as a first-class target. Codeium does cover JetBrains. GitHub Copilot does too. But Cursor and several newer tools are VS Code-only. For JetBrains-heavy teams, Refact.ai's coverage is an advantage.

Model support

On the cloud tier, Refact.ai uses its own hosted models plus optional Claude and GPT-4o backends. On self-hosted deployments, you configure the model. The supported open-weight models include:

DeepSeek Coder (6.7B and 33B variants)
StarCoder2 (3B, 7B, 15B)
Llama-based coding models
Any model compatible with the HuggingFace text-generation-inference format

The model landscape for code is evolving fast. DeepSeek Coder V2 and its successors have closed a lot of the gap with proprietary models on code benchmarks. By early 2026, a well-configured self-hosted Refact deployment running a strong open-weight coding model produces completions that are good enough for productive daily use, not indistinguishable from Claude Sonnet, but meaningfully useful rather than merely amusing.

Pricing

The open-source self-hosted version is free. You pay for the GPU hardware (or your cloud provider's GPU instances) and any electricity costs. There's no per-seat license for the self-hosted version.

The Refact.ai cloud has a free tier for individual developers, covering enough usage to evaluate the product seriously. The free tier uses Refact's hosted models. If you want Claude or GPT-4o as the backend on the cloud tier, that requires a paid plan or BYO API key configuration.

Enterprise plans with SLA, support, and advanced features are custom-priced. Contact SmallCloud directly. The pricing is consistent with enterprise software: depends on team size, support requirements, and whether you want on-site deployment support.

For the typical path: an individual developer can use the cloud free tier to evaluate Refact.ai at no cost. A team evaluating self-hosted deployment will need GPU hardware (existing or cloud-rented) and a few hours of setup time. An enterprise committing to Refact.ai with fine-tuning and support will be in conversations with SmallCloud's sales team.

Refact.ai vs Tabby

Tabby is the closest direct competitor: open-source, self-hosted, focused on code completions. Tabby is simpler to set up and has a strong community around it. Refact.ai has more features: chat, fine-tuning, broader IDE support.

The choice between them usually comes down to what you need. If you want self-hosted completions with minimal setup complexity, Tabby is the easier starting point. If you need fine-tuning, JetBrains support on par with VS Code, or a chat panel alongside completions, Refact.ai is the more complete option.

Both are actively maintained. Both have communities smaller than GitHub Copilot but meaningful for open-source projects.

Refact.ai vs Codeium

Codeium (now part of Windsurf) has a strong free cloud tier and broad IDE support. It's a better experience for most individual developers than Refact.ai on self-hosted open-weight models. The quality gap is real.

The comparison is only relevant if cloud tools are off the table. If you can use cloud tools, Codeium's free tier gives you better completions with less setup work. If you can't use cloud tools, Refact.ai is the more capable self-hosted alternative.

Refact.ai vs Continue

Continue is an open-source VS Code and JetBrains extension that connects to any LLM provider. It's model-agnostic and focused on the IDE integration layer rather than server infrastructure. You configure Continue with your model provider of choice (including local models via Ollama or LM Studio).

Continue and Refact.ai serve related but different needs. Continue is the integration layer; you manage models separately. Refact.ai is the full stack: server, models, fine-tuning pipeline, and IDE plugins all from one place. For teams that want a complete self-hosted package, Refact.ai is more turnkey. For teams that want maximum flexibility in assembling their own stack, Continue is more composable.

Who should use Refact.ai

The primary audience is engineering teams with data residency requirements. If your legal or security team has said "code cannot leave our network," Refact.ai is the most complete self-hosted coding assistant solution available in early 2026. It covers completions, chat, and fine-tuning in a package that requires no external API calls once deployed.

Teams with significant proprietary internal code that differs substantially from public open-source patterns are a strong secondary fit. Fine-tuning a model on your internal codebase produces completions that generic base models can't match for your specific patterns. If you have a large internal library that everything depends on, a fine-tuned model that knows that library's API will save your developers real time.

Open-source projects self-hosting their own tooling for philosophical consistency will find Refact.ai's MIT-licensed core appropriate.

The developers who should look elsewhere: individuals and small teams without compliance requirements, anyone who wants the best possible completion quality without the infrastructure investment, and teams looking for a mature agentic tool with multi-step autonomous task execution. On those dimensions, cloud alternatives are ahead.

Getting started

Self-hosted path:

docker pull smallcloudai/refact_self_hosting
docker run -d \
  --gpus all \
  -p 8001:8001 \
  -v refact-volume:/perm_storage \
  smallcloudai/refact_self_hosting

After the container starts, open http://localhost:8001 in your browser. Download a model through the UI (DeepSeek Coder 6.7B is a reasonable starting point for a single consumer GPU), then install the VS Code or JetBrains plugin and point it at http://localhost:8001 instead of the Refact.ai cloud endpoint.

Cloud path: go to refact.ai, create an account, and install the plugin. The cloud free tier lets you evaluate the product without GPU hardware.

For fine-tuning: once the server is running, go to the fine-tuning section in the UI, select your training data directory (your most representative code), configure the training parameters (the defaults are sensible for a first run), and start the job. A 7B model fine-tuning run takes 2-6 hours on a modern single GPU depending on dataset size.

The bottom line

Refact.ai is the right self-hosted coding assistant for teams that actually need self-hosted. The compliance use case is real and the product is production-ready for it. The fine-tuning capability is a genuine value-add for teams with significant proprietary internal code. The open-source licensing means no vendor lock-in and full auditability.

The caveats are equally real: self-hosted open-weight models are not as capable as frontier cloud models, setup requires GPU expertise, and the agent features don't match Claude Code or Cursor. If your constraints allow cloud tools, GitHub Copilot, Codeium, or Continue with a frontier model backend will give you a better daily experience.

But for the specific team that needs code to stay on-premise: Refact.ai does that reliably, maintains its open-source commitments, and has been iterating on quality since September 2023. That track record matters when you're betting compliance-critical infrastructure on a tool.

Key features

Self-hosted deployment on your own GPU infrastructure
Fine-tuning on your own codebase for domain-adapted completions
VS Code and JetBrains IDE plugins
Code completion, chat, and function-level explanations
Privacy mode with all inference on-premise
Support for open-weight models (Llama, StarCoder, DeepSeek Coder)
Telemetry and usage dashboards for team deployments

Pros and cons

Pros

+ Fully self-hostable with complete code privacy
+ Fine-tuning on your codebase adapts completions to your patterns
+ Open-source and actively maintained
+ VS Code and JetBrains support covers most teams
+ Free tier on cloud is useful for individual evaluation
+ Supports open-weight models for full stack control

Cons

− Self-hosted setup requires GPU hardware investment
− Completion quality on self-hosted open-weight models trails frontier cloud models
− Smaller community than GitHub Copilot or Codeium
− Fine-tuning pipeline has a learning curve
− Agent features less developed than Claude Code or Cursor

Who is Refact.ai for?

Enterprises with data residency requirements that rule out cloud AI tools
Teams wanting to fine-tune a coding assistant on internal code patterns
Security-conscious organizations that need all inference on-premise
Open-source projects that want to self-host their AI coding tooling

Alternatives to Refact.ai

If Refact.ai isn't quite the right fit, the closest alternatives are tabby , codeium , continue , and github-copilot . See our full Refact.ai alternatives page for side-by-side comparisons.

Frequently Asked Questions

What is Refact.ai?

Refact.ai is an open-source AI coding assistant built by SmallCloud. It provides code completions, chat, and code explanations through VS Code and JetBrains plugins. It can run entirely on your own GPU hardware (self-hosted) or through their cloud service. It supports custom fine-tuning on your codebase.

Is Refact.ai open source?

Yes. The core Refact.ai server and plugins are open source on GitHub at smallcloudai/refact. You can inspect the code, self-host it, and contribute to the project. The license allows commercial self-hosting.

How does self-hosting Refact.ai work?

You run the Refact server on a machine with a GPU (NVIDIA GPU with 8GB+ VRAM is the minimum useful configuration). The server handles model inference and exposes an API that the IDE plugins connect to. Models can be pulled from Hugging Face or provided as local files. Setup is documented in the GitHub repository and takes roughly an hour for someone comfortable with Docker and GPU configuration.

What models does Refact.ai support?

Refact.ai supports a range of open-weight models including DeepSeek Coder, StarCoder2, and Llama-based coding models. On the cloud tier, it also supports Claude and GPT-4o as backends. For self-hosted deployments, any compatible open-weight model on Hugging Face can be configured.

How does Refact.ai compare to Tabby?

Both are open-source self-hosted coding assistants. Tabby focuses on code completions and is simpler to set up. Refact.ai has a broader feature set: chat, fine-tuning pipelines, and more IDE integrations. If you need only completions and want the simplest possible self-hosted setup, Tabby is easier. If you need fine-tuning and chat alongside completions, Refact.ai is the more capable option.

Related agents

Aide

Open-source AI-native IDE built on VS Code with agent-first workflows and local memory

codingide Free tier

2,192 ★ — 0.0%

Aider

Git-aware AI pair programmer that runs in your terminal

Featured

codingcli Free

45,789 ★ ↑ 2.6%

Amazon Q Developer

AWS-native AI coding assistant with deep cloud integration

codingvscode-extension Free + from $19/mo