The Simple Guide to Running Qwen 3.5 9B Locally with Ollama

Privacy is becoming a luxury in the AI world. If you’re tired of sending your data to the cloud every time you ask a question, running a model locally is the answer. Today, we’re looking at Qwen 3.5 9B—a powerhouse model from Alibaba—and how to get it running on your own machine using Ollama.

Whether you’re a developer looking for an API or someone who just wants a private chatbot, this guide should help.

Why Qwen 3.5 9B?

Qwen 3.5 9B is a “sweet spot” model. At 9 billion parameters, it’s small enough to run on most modern consumer hardware but smart enough to handle complex coding, reasoning, and creative writing tasks.

Key Highlights:

Performance: Punches way above its weight class, often rivaling much larger models.
Multilingual: Excellent support for various languages beyond just English.
Efficiency: Runs smoothly on laptops with 16GB of RAM or a decent GPU.

Step 1: Install Ollama

Ollama is the easiest way to run local AI. It handles all the messy technical bits (like GPU drivers and model weights) so you don’t have to.

For Non-Tech Users:

Go to ollama.com.
Download the installer for your OS (Windows, Mac, or Linux).
Run the installer and follow the prompts.

For Tech Users:

If you’re on a Mac or Linux, you can use the terminal:

Mac (Homebrew):

brew install ollama

Linux:

curl -fsSL https://ollama.com/install.sh | sh

Step 2: Run Qwen 3.5 9B

Once Ollama is installed, getting the model is a single command. Open your terminal (or Command Prompt on Windows) and type:

ollama run qwen3.5:9b

What happens next?

Ollama will download the model (about 6.6GB).
Once finished, you’ll see a prompt: >>>.
Start chatting! You can ask it to write code, summarize text, or just say hello.

Step 3: Using Your New Local AI

For the Casual User

You can just keep the terminal open and chat whenever you want. To exit, type /exit. To start again later, just run the same ollama run qwen3.5:9b command—it won’t need to download again.

For the Developer (The “Tech” Stuff)

Ollama automatically runs a local API server at http://localhost:11434. You can integrate Qwen into your own apps using simple curl commands or the Ollama Python/JS libraries.

Example API Call:

curl http://localhost:11434/api/chat -d '{
  "model": "qwen3.5:9b",
  "messages": [
    { "role": "user", "content": "Explain quantum physics like I am five." }
  ],
  "stream": false
}'

System Requirements

To have a smooth experience with the 9B model, here is what you’ll need:

RAM: 16GB is recommended. 8GB might work but will be slow.
GPU: If you have an NVIDIA card (8GB+ VRAM) or an Apple Silicon Mac (M1/M2/M3), it will be lightning fast.
Disk Space: About 10-15GB of free space for the model and overhead.

Final Thoughts

Running Qwen 3.5 9B locally isn’t just for “techies” anymore. With Ollama, it’s as simple as downloading an app and running a command. You get 100% privacy, no subscription fees, and a world-class AI right on your desk.

Give it a spin and see how it compares to the big cloud models!

Rushi's

Ctrl+AI+Ship