Running LLMs locally has become a normal part of how developers work. Two tools dominate this space: llama.cpp and Ollama. They look like competitors, but the relationship is more direct — Ollama is built on top of llama.cpp. This post covers the technical differences, where each performs better, and when to use one versus the other. Table of […]

Read More →

Privacy is becoming a luxury in the AI world. If you’re tired of sending your data to the cloud every time you ask a question, running a model locally is the answer. Today, we’re looking at Qwen 3.5 9B—a powerhouse model from Alibaba—and how to get it running on your own machine using Ollama. Whether you’re a […]

Read More →