- Practicaly AI
- Posts
- 🧠How to Run Claude Code Fully Offline on Your MacBook
🧠How to Run Claude Code Fully Offline on Your MacBook
(No Anthropic API Required)
Who This Is For: Developers who already use Claude Code and want to run it without an API bill, without sending code to Anthropic's servers, or without an internet connection at all. If you're on a long flight, working under an NDA that forbids cloud inference, or just tired of watching your Max plan burn through tokens on routine refactors, this guide shows you exactly how to point Claude Code at a local model running on your MacBook.
What You'll Learn:
How to run Claude Code against a local open-source model using Ollama's new Anthropic Messages API compatibility (shipped in Ollama v0.14, January 2026)
Which coding model to pick based on how much unified memory your MacBook has — from 16GB up to 64GB+
The exact environment variables that make Claude Code talk to
localhostinstead ofapi.anthropic.comHow to verify your setup is actually offline (not silently falling back to the cloud)
The real performance and accuracy tradeoffs — including why Claude Code's edit accuracy drops from 98% to 70-80% with non-Claude models, and what to do about it
TL;DR — Too Long Didn’t Read
The problem: Claude Code is excellent, but it ships locked to Anthropic's API. Heavy users spend $100-$200/month, sensitive codebases can't legally leave the machine, and it's useless on a plane with no Wi-Fi.
The solution: As of January 2026, Ollama v0.14+ natively speaks the Anthropic Messages API. Claude Code treats it like a compatible backend between Anthropic's cloud and a model running on your MacBook's GPU.
The setup: Install Ollama, pull a coding model (
qwen3-coder,gpt-oss:20b, orglm-4.7-flash), set three environment variables, and launch Claude Code. No proxy, no LiteLLM, no Docker.The workflow in one sentence:
ANTHROPIC_BASE_URL=http://localhost:11434plusANTHROPIC_AUTH_TOKEN=ollamaplusclaude --model qwen3-coderequals a fully local coding agent.The hardware reality: 16GB MacBooks work with smaller models. 32GB is the practical floor for serious work. 64GB+ lets you run Qwen3-Coder-Next, the closest open-weight model to Sonnet-class quality.
The honest caveat: Local models are slower (sometimes a minute for a simple reply on M1 Max), less reliable at tool calls, and noticeably worse at applying edits cleanly. This is a real workflow, not a full Claude replacement — use it for routine work and keep your cloud plan for the hard problems.
1. Why This Became Possible in 2026?
The old way of running Claude Code locally was messy. You had to install LiteLLM as a translation proxy, write a YAML config mapping Anthropic model names to OpenAI-format endpoints, and hope streaming didn't break on the next Claude Code update. It worked, but it felt like a science project.
That changed when Ollama shipped v0.14.0 in January 2026 with native Anthropic Messages API compatibility. The /v1/messages endpoint is now built directly into Ollama — same request format, same tool-calling semantics, same streaming events as Anthropic's real API. Claude Code sends its requests to localhost:11434 and Ollama handles them transparently, routing to whatever local model you've loaded.
No proxy. No translation layer. Three environment variables and you're done.
2. What You Need Before You Start?
A MacBook with Apple Silicon (M1 or newer) and at least 16GB of unified memory. Apple Silicon has a specific advantage here: the CPU and GPU share the same memory pool, so the entire RAM budget is available for the model. A 64GB MacBook can load models that would require a multi-GPU PC setup.
You'll also need Node.js 18+ (for Claude Code itself), about 20-40GB of free disk space for the model weights, and roughly 15 minutes of setup time. A decent internet connection is required once — to download Ollama, Claude Code, and your chosen model — after which everything runs offline.
One version note: Ollama's Anthropic API compatibility is still being patched for edge cases in streaming and tool calling. Make sure you're on Ollama 0.14.3-rc1 or newer. Earlier v0.14 builds have known issues with streaming tool calls that will make Claude Code hang or fail silently.
3. The Step By Step Workflow
Want the full breakdown?
This is where you get real AI workflows, prompts, and systems you can use to automate your work. If you're serious about using tools like Claude to grow your business, this is for you.
Already a paying subscriber? Sign In.
Reply