Running DeepSeek Distilled Models Locally Using Ollama

Published Wed, Jan 29, 2025

DeepSeek R1 is a seriously impressive language model, and the distilled versions offer a sweet spot between performance and resource requirements. They’re smaller, faster, and perfect for local deployment. And when you pair them with Ollama, a fantastic tool for running LLMs locally, you’ve got a winning combination. Ollama simplifies the entire process, from model management to serving. So, let’s get our hands dirty!

DeepSeek R1, Distilled Models, and Ollama: A Quick Intro

DeepSeek R1 is a state-of-the-art language model developed by DeepSeek AI. Like other LLMs, it can generate text, write different kinds of creative content, and answer your questions in an informative way. Distilled models,as the name suggests, a smaller,more computationally efficient model that has been trained by transferring knowledge from a larger, more complex “teacher” model, in this case the teacher model is the actual DeepSeek R1. They’re trained to mimic the behavior of the full model but with a significantly reduced footprint. This makes them perfect for running on personal computers. Ollama simplifies the entire process of managing and running these LLMs locally. It handles the complexities of model loading, inference, and resource management, so you can focus on experimenting and building cool applications.

Step 1: Installing Ollama on Linux

First things first, we need to get Ollama up and running. Since I’m a Linux enthusiast, I’ll focus on that. For others head over to the official Ollama website and grab the appropriate installation package for your OS.

Lets Install the Ollama installation package for Linux from its official website. You can do this by running:

 curl -fsSL https://ollama.com/install.sh | sh

Once installed, verify with command ollama --version. Easy peasy!

Step 2: Ollama Server

With Ollama installed, we need to start the Ollama server. This server is responsible for managing the models and serving inference requests. Open your terminal and run:

 ollama server

This command starts the Ollama server in the background. Keep this terminal window open, or run it in a separate terminal session. It’s essential for Ollama to function.

Step 3: Getting DeepSeek-R1 1.5B

Now for the exciting part! We need to download the DeepSeek-R1 1.5B distilled model. The easiest way is often from the Ollama library itself:

 ollama pull deepseek-r1:1.5b

Ollama handles the download and prepares the model for use. This might take a while depending on your internet speed.

image: ollama_pull

But the beauty of Ollama is that you’re not limited to just one size! DeepSeek R1 comes in various sizes, including 7B, 8B, 14B, and even larger versions like 32B (and more). The key here is your system’s resources. Running the larger models requires significantly more VRAM and potentially a powerful GPU.

1.5B: A great starting point. Good for machines with limited resources.
7B/8B: Offers a noticeable improvement in performance over the 1.5B model but requires more VRAM. 8GB+ VRAM is recommended.
14B: Even more powerful, but you’ll likely need 16GB+ of VRAM, possibly more.
32B and beyond: These are the big boys. You’ll need a machine with substantial resources.

So, depending on what your system can handle, you can choose the appropriate size. For example, to download the 14B version, you would use:

 ollama pull deepseek-r1:14b

Remember, the larger the model, the more resources it will require. Start with a smaller model and work your way up if your system can handle it. Experiment to find the sweet spot between performance and resource usage.

Step 4: Running the Model

With the model downloaded, you can finally run it! Open a new terminal window (keeping the ollama serve terminal running) and use the following command:

 ollama run deepseek-r1:1.5b

image: ollama_run

This will start the DeepSeek-R1 1.5B model. You should see a prompt where you can start interacting with the model. Type your prompts and see the magic happen! One particularly cool feature I’ve noticed is that you can glimpse into DeepSeek’s thought process. Keep an eye out for <think> and </think> tags in the output. These can provide fascinating insights into how the model is reasoning and generating its responses.
I’ve been experimenting with it for creative writing, code generation, and even brainstorming ideas. It’s incredibly impressive what these distilled models can do!

Step 5: Exiting the Ollama Prompt

When you’re done chatting with the model, you can exit the Ollama prompt by pressing Ctrl + D (or Cmd + D on macOS) or you can enter \bye in the prompt. This will gracefully close the connection to the model. You can then close the terminal window where you ran the ollama run command. Remember, the ollama serve command needs to continue running in its own terminal for Ollama to function. If you want to completely stop Ollama, you’ll need to stop the ollama serve process as well (usually by pressing Ctrl + C in the ollama serve terminal).

I’m incredibly impressed with Ollama and the ease with which it lets you run DeepSeek distilled models locally. If you’re an AI enthusiast or a developer looking to experiment with LLMs, I highly recommend giving it a try. The combination of DeepSeek’s power and Ollama’s simplicity is truly remarkable. Happy experimenting!