DIY Craic – Better DIY Projects

Guides • Reviews • DIY Ideas

How to Host DeepSeek Locally on a Docker Home Server

Self-Hosting DeepSeek on Docker Server

In this tutorial, I will show you how to install the DeepSeek AI chat (or other LLM AI models) on your home server in just a few minutes and start using it locally completely free and without limitations. This guide assumes that you already have a home server with Docker and Portainer installed. If you don’t have a home server yet, I recommend checking out my beginner’s guide on setting up a home server with a Raspberry Pi (link). In that article, I cover the installation of Docker and Portainer in detail. And yes, you can even host DeepSeek on a Raspberry Pi as well!

Requirements:

  • Docker and Portainer installed
  • At least 1.2GB of free disk space available (to download the smallest distilled DeepSeek model)
  • At least 1.5GB of free RAM available (to run the smallest distilled DeepSeek model)

Adding the Docker Stack in Portainer

Create a new stack in Portainer and simply paste the following code:

services:
  webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: webui
    ports:
      - 7000:8080/tcp
    volumes:
      - open-webui:/app/backend/data
    extra_hosts:
      - "host.docker.internal:host-gateway"
    depends_on:
      - ollama
    restart: unless-stopped

  ollama:
    image: ollama/ollama
    container_name: ollama
    expose:
      - 11434/tcp
    ports:
      - 11434:11434/tcp
    healthcheck:
      test: ollama --version || exit 1
    volumes:
      - ollama:/root/.ollama
    restart: unless-stopped

volumes:
  ollama:
  open-webui:

Click “Deploy the Stack” and wait for the installation and startup of the services. This setup will install two containers:

  • Ollama – A backend service that runs and manages large language models (LLMs) locally.
  • WebUI – A user-friendly web interface that allows you to interact with Ollama easily.

Once both containers are installed and running, you can access the web interface by opening:

http://host_name_or_ip:7000 in your browser.

Using WebUI

On the WebUI page, you will first be prompted to create an administrator account. You can enter any credentials you like—these will not be transmitted anywhere and will be stored locally on your server.

Downloading a DeepSeek Model

  1. Go to the Workspace tab. Click on the drop-down arrow next to “Select a model”.
  2. Type in the search:
    • “deepseek-r1” for the 7b model or
    • “deepseek-r1:1.5b” for the 1.5b model.
  3. Click “Pull ‘deepseek-r1’ from Ollama.com” to download the selected model.

That’s it! Once the model is downloaded, you can start using your local DeepSeek AI chat!

You can choose from different DeepSeek model versions depending on your available hardware. Here are some examples:

  • deepseek-r1:1.5b – The smallest distilled model, requiring 1.2GB of disk space and 1.5GB of free RAM to run.
  • deepseek-r1:7b – A larger distilled model that needs 4.7GB of disk space and 5.5GB of RAM.
  • deepseek-r1:671b – The most advanced full model currently available, requiring 405GB of disk space and about the same amount of RAM.

Check the full list of DeepSeek models here: DeepSeek models on Ollama.

You can download multiple models and switch between them just like in the Chat GPT web interface.

Additionally, feel free to explore other non-DeepSeek models available in the Ollama library.

Performance Comparison

I tested the first two DeepSeek models on my two home server setups:

I’m not an expert in benchmarking LLMs, so for simplicity, I compared the performance of the 1.5b and 7b models on both of my home servers by asking the AI the same question:

“Summarize Einstein’s theory of relativity in simple terms.”

I then measured the time it took for each model to generate a response. Here are the results I got:

ModelRaspberry Pi 4 (8GB DDR4)Intel N100 Mini PC (32GB DDR5)
DeepSeek 1.5b6m 43s1m 30s
DeepSeek 7b26m 20s3m 54s

As expected, the Raspberry Pi 4 struggles with performance, while the Intel N100 Mini PC, combined with fast DDR5 memory, delivers significantly faster results. You can compare response quality and additional details in the screenshots below.

What’s Next?

  • Using a Self-Hosted LLM on Mobile
    For convenient access to your self-hosted LLM on a mobile device, you can use a dedicated app. I personally found this one to be quite useful. Simply add your server address in the app like this:
    http://host_name_or_ip:11434
  • Securing Remote Access
    If you want to access your LLM server from outside your home network, do not expose the port directly without first setting up authentication. By default, Ollama does not require authentication, which is a security risk. Ensure you configure proper access control before opening the server to the internet.
  • Leveraging GPU Acceleration
    If your home server has an NVIDIA GPU, consider exploring how to enable GPU acceleration for improved processing speed and performance.

Stay Updated

If you liked this guide, follow me on social media (links are in the website footer) to show your support and stay updated on similar articles in the future.

Enjoyed This Content? Support Me!

If you enjoyed this article and would like to see more like it, please consider supporting me by buying a coffee—it would mean a lot and keep me motivated!

Buy me a coffee

(Unauthorized copying of this page content is prohibited. Please use the Share buttons below or provide a direct link to this page instead. Thank you!)


4.2 5 votes
Article Rating
Subscribe
Notify of
guest

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Lad
Lad
2 months ago

Works great, to make it work with a gpu, I just added this to the ollama component definition:

    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

In case of a trouble you can check this page: https://docs.docker.com/compose/how-tos/gpu-support/

Esvee
Esvee
1 month ago

thanks for this tutorial. I found this project on GitHub that does the same, but with intel gpu support. I have an N100 too and after deploying this I can indeed see the gpu being utilised by ollama with intel_gpu_top. I need to up the allocated VRAM in Bios to get it faster (currently I’m remote). Thought people might be interested. https://github.com/mattcurf/ollama-intel-gpu?tab=readme-ov-file

2
0
Would love your thoughts, please comment.x
()
x