Sharing Local LLM Models Between Ollama and llama.cpp

Sharing local LLM models between Ollama and llama.cpp seems like a niche concern until you’ve burned through tens of GB of disk space on duplicate copies of the same model. The two tools use completely different storage formats by default, but you can configure them to share one file.

The problem: data duplication
Strategy 1: The “llama.cpp first” approach (recommended)
Strategy 2: The “Ollama first” approach
Comparison of methods
Advanced tips
- Symlinking for sanity
- Relocating Ollama’s model store
Caveats to watch for

The problem: data duplication

When you run ollama pull, models land in a hidden directory as blobs with SHA-256 filenames. When you use llama.cpp, you point to a human-readable .gguf file. Without any coordination, you end up with two copies of the same 5GB–50GB model.

Strategy 1: The “llama.cpp first” approach (recommended)

Download a GGUF file once and point both tools at it.

1. Download your model

Store your GGUF in a central directory like ~/models.

# Using the Hugging Face CLI to download Qwen3 8B
huggingface-cli download bartowski/Qwen3-8B-GGUF --include "Qwen3-8B-Q4_K_M.gguf" --local-dir ~/models

2. Point llama.cpp at the model

Use the -m flag:

./llama-cli -m ~/models/Qwen3-8B-Q4_K_M.gguf -p "Hello, world!"

3. Register the file with Ollama

Create a Modelfile to register the existing GGUF with Ollama without copying it.

Create a file named Qwen3.Modelfile:

FROM /Users/yourname/models/Qwen3-8B-Q4_K_M.gguf
# Optional: system prompt or parameters

ollama create MyQwen3 -f Qwen3.Modelfile

Important

What actually happens: Ollama imports the GGUF into its blobs directory. On the same filesystem, it creates a hard link, so no extra disk space is consumed and the model keeps working even if you delete the original. On a different filesystem (an external or network drive), it falls back to a full copy, which doubles usage. Check disk space before and after ollama create if you’re not sure which case applies.

Strategy 2: The “Ollama first” approach

Already have models pulled through Ollama? The blobs directory holds standard GGUF files. They just have hash-based names.

1. Locate the blobs

Ollama stores models here by default:

macOS: ~/.ollama/models/blobs
Linux: /usr/share/ollama/.ollama/models/blobs
Windows: C:\Users\<username>\.ollama\models\blobs

2. Find the right blob

ollama show --modelfile <model_name>

Look for the FROM line: something like FROM sha256:87048bcd....

3. Run it directly with llama.cpp

llama.cpp identifies GGUF files by magic bytes, not the extension, so you can point it at the blob as-is:

./llama-cli -m ~/.ollama/models/blobs/sha256-87048bcd55216712ef14c11c2c303728463207b165bf18440b9b84b07ec00f87

Warning

Don’t run ollama rm <model_name> while the blob is in use by llama.cpp. Ollama will delete the file.

Comparison of methods

Feature	Strategy 1: central folder	Strategy 2: Ollama blobs
Ease of use	High (human-readable names)	Low (searching for hashes)
Disk space	Minimal (one file)	Minimal (one file)
Persistence	Stable	Risks being deleted by `ollama rm`
Flexibility	Works with all GGUF tools	Specific to Ollama management

Advanced tips

Symlinking for sanity

If you’re stuck with the Ollama-first approach, symbolic links make the hash filenames tolerable:

# macOS/Linux
ln -s ~/.ollama/models/blobs/sha256-87048... ~/models/llama3-8b.gguf

Any tool that accepts a GGUF path (llama.cpp, LM Studio, Jan) can use ~/models/llama3-8b.gguf without duplicating the file.

Relocating Ollama’s model store

Setting the OLLAMA_MODELS environment variable before starting the service moves where Ollama stores everything. Handy if your home directory is on a small SSD and models belong on a bigger drive.

# macOS/Linux: add to ~/.zshrc or ~/.bashrc
export OLLAMA_MODELS=~/models/ollama

# Windows
$env:OLLAMA_MODELS = "D:\models\ollama"

After restarting Ollama, new pulls go to the new location. Models at the old path need to be re-pulled or migrated manually by moving the blobs and manifests subdirectories.

Caveats to watch for

Version mismatch. Keep your llama.cpp build current. If Ollama bundles a model in a newer GGUF spec than your binary supports, the model won’t load.

Quantization. Ollama often uses K-quants or IQ-quants. If a model loads but produces garbled output, check whether your llama.cpp binary supports that quantization type.

Missing prompt templates. A raw blob in llama.cpp has none of the system prompt or chat template from the Ollama Modelfile. Pass the template manually with --chat-template, or use -p for a one-shot system prompt.

Strategy 1 is almost always the right call: download once, use everywhere. The Ollama-first approach works, but you’re one ollama rm away from losing a model you still need in llama.cpp.

“Buy less, choose well, make it last .”-Vivienne Westwood

Rushi's

Ctrl+AI+Ship