Fixing the “Model May Not Exist” Error When Using Ollama with Claude Code

If you’ve tried running a local model through Ollama with Claude Code and been greeted by this message:

There’s an issue with the selected model (qwen3-coder:30b). It may not exist or you may not have access to it. Run /model to pick a different model.

…even though the model is clearly installed and runs fine with ollama run — you’re not alone. This comes up constantly in the Claude Code and Ollama GitHub repos, and the error message is misleading. The model exists. Ollama can see it. The real problem is somewhere else entirely.

Here’s what’s actually going on and how to fix it.

The Setup

Claude Code is Anthropic’s agentic coding tool. It reads, edits, and executes code in your working directory through natural language. While it’s designed to work with Anthropic’s own models, it can also connect to any backend that implements the Anthropic Messages API — including Ollama’s local server.

The basic setup looks simple enough. You export a few environment variables, point Claude Code at localhost, and specify your model:

export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434
claude --model qwen3-coder:30b

For a lot of people, this is where things break. The model is pulled. Ollama is running. Yet Claude Code refuses to connect.

Why It Fails: Three Hidden Problems

The error looks like a single issue, but it’s actually the result of up to three separate problems stacking on top of each other.

Problem 1: Claude Code Uses Three Models, Not One

This is the root cause that trips up almost everyone. Claude Code doesn’t just use the one model you specify with --model. Internally, it maps tasks to three model tiers: Haiku (for lightweight tasks like summarization), Sonnet (for standard coding work), and Opus (for complex reasoning). When you launch Claude Code pointed at Ollama, it sends your chosen model name for the primary task — but for background tasks, it falls back to the Anthropic defaults: claude-haiku-4-5-20251001, claude-sonnet-4-5-20250514, and so on.

Ollama has no idea what claude-haiku-4-5-20251001 is. It returns a 404. Claude Code interprets that as the model not existing and shows you the vague error message.

If you check Claude Code’s debug logs, this becomes obvious:

"message": "model 'claude-haiku-4-5-20251001' not found"

The model you specified is fine. It’s the models you didn’t specify that are failing.

Problem 2: Ollama’s Default Context Window Is Too Small

Ollama ships with a default context window of 4,096 tokens. Claude Code’s agentic workflows — reading files, planning edits, calling tools, maintaining conversation history — routinely require tens of thousands of tokens of context. At 4K, the model either fails outright, enters infinite loops, or produces hallucinated output that Claude Code can’t parse, which Claude Code then treats as a model error.

You won’t necessarily see a clear “context too small” error. It often shows up as the same generic “model may not exist” failure because the API response comes back malformed or truncated.

Problem 3: A Stale Anthropic API Key Overrides Your Configuration

If you’ve previously used Claude Code with an Anthropic subscription, there’s a good chance ANTHROPIC_API_KEY is still set in your shell profile (.bashrc, .zshrc, or similar). Even when you set ANTHROPIC_BASE_URL to point at localhost, a non-empty ANTHROPIC_API_KEY can cause Claude Code to try authenticating against Anthropic’s servers or behave unpredictably during routing. The key must be explicitly set to an empty string, not just unset.

The Fix: A Complete Working Configuration

You need to address all three problems at once.

Step 1: Create a Custom Model with a 64K Context Window

Don’t use the base model tag directly. Create a new model variant with a larger context window using a Modelfile:

echo "FROM qwen3-coder:30b
PARAMETER num_ctx 65536" > Modelfile

ollama create qwen3-coder-64k -f Modelfile

Verify the configuration was applied:

ollama show qwen3-coder-64k

Look for num_ctx under the Parameters section. It should read 65536. If it shows 4096 or the parameter is missing, the Modelfile wasn’t applied correctly — recreate it.

Step 2: Set All Three Default Model Variables

Tell Claude Code to use your local model for every tier — Haiku, Sonnet, and Opus:

export ANTHROPIC_DEFAULT_HAIKU_MODEL="qwen3-coder-64k"
export ANTHROPIC_DEFAULT_SONNET_MODEL="qwen3-coder-64k"
export ANTHROPIC_DEFAULT_OPUS_MODEL="qwen3-coder-64k"

This prevents Claude Code from ever trying to call a model that doesn’t exist on your Ollama instance. You can use different models for each tier if you have them available (for example, a smaller model for Haiku to save memory), but using the same model for all three is the simplest path to a working setup.

Step 3: Explicitly Empty the API Key

export ANTHROPIC_API_KEY=""
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_BASE_URL=http://localhost:11434

The empty string for ANTHROPIC_API_KEY is critical. It must be explicitly empty, not just absent from your environment.

Step 4: Disable Telemetry

export DISABLE_TELEMETRY=1

Claude Code periodically sends telemetry and checks for MCP server configurations at Anthropic’s API. When running against a local Ollama backend, these requests either hang or return 404 errors, adding noise and sometimes causing startup delays.

Step 5: Launch

claude --model qwen3-coder-64k

Or as a single command with everything inline:

ANTHROPIC_AUTH_TOKEN=ollama \
ANTHROPIC_API_KEY="" \
ANTHROPIC_BASE_URL=http://localhost:11434 \
ANTHROPIC_DEFAULT_HAIKU_MODEL="qwen3-coder-64k" \
ANTHROPIC_DEFAULT_SONNET_MODEL="qwen3-coder-64k" \
ANTHROPIC_DEFAULT_OPUS_MODEL="qwen3-coder-64k" \
DISABLE_TELEMETRY=1 \
claude --model qwen3-coder-64k

Why this works

Each piece removes one source of ambiguity. The three default model variables stop Claude Code from requesting models that don’t exist on your Ollama server. The 64K context window gives the agentic workflow enough room to load files, maintain multi-turn conversations, and generate edits without producing malformed responses. The empty API key prevents leftover Anthropic credentials from interfering with routing. And disabling telemetry stops Claude Code from making network calls to Anthropic’s servers that will never succeed from a local setup.

Quick Troubleshooting

If the fix above still doesn’t work, run through these checks:

Test Ollama’s API directly:

curl http://localhost:11434/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3-coder-64k","max_tokens":50,"messages":[{"role":"user","content":"hi"}]}'

If this returns an error, the issue is with Ollama, not Claude Code. Update Ollama to the latest version — the Anthropic-compatible API endpoint (/v1/messages) was added recently and older 0.15.x releases had bugs.

Check for environment variable conflicts:

env | grep ANTHROPIC

Make sure no stale values are being inherited from your shell profile.

Verify the model name exactly matches:

ollama list

The name you pass to --model and the three default model variables must exactly match what appears in ollama list, including the tag.

Recommended Shell Alias

To avoid typing this every time, add an alias to your ~/.bashrc or ~/.zshrc:

alias claude-local='ANTHROPIC_AUTH_TOKEN=ollama \
  ANTHROPIC_API_KEY="" \
  ANTHROPIC_BASE_URL=http://localhost:11434 \
  ANTHROPIC_DEFAULT_HAIKU_MODEL="qwen3-coder-64k" \
  ANTHROPIC_DEFAULT_SONNET_MODEL="qwen3-coder-64k" \
  ANTHROPIC_DEFAULT_OPUS_MODEL="qwen3-coder-64k" \
  DISABLE_TELEMETRY=1 \
  claude --model qwen3-coder-64k'

Then run claude-local to launch.

The short version

The error message points you toward the model, which is a red herring. The real issue is that Claude Code talks to three models behind the scenes, Ollama’s default context window is too small for agentic work, and stale environment variables can quietly interfere with routing. Once you know that, the fix is straightforward: tell Claude Code exactly which models to use for every tier, give those models enough context, and clean up your environment.

“Troubleshooting is an art, particularly in the age of AI “-Rushi

Rushi's

Ctrl+AI+Ship