In this tutorial, I will show you how to install the DeepSeek AI chat (or other LLM AI models) on your home server in just a few minutes and start using it locally completely free and without limitations. This guide assumes that you already have a home server with Docker and Portainer installed. If you don’t have a home server yet, I recommend checking out my beginner’s guide on setting up a home server with a Raspberry Pi (link). In that article, I cover the installation of Docker and Portainer in detail. And yes, you can even host DeepSeek on a Raspberry Pi as well!
Requirements:
- Docker and Portainer installed
- At least 1.2GB of free disk space available (to download the smallest distilled DeepSeek model)
- At least 1.5GB of free RAM available (to run the smallest distilled DeepSeek model)
Adding the Docker Stack in Portainer
Create a new stack in Portainer and simply paste the following code:
services:
webui:
image: ghcr.io/open-webui/open-webui:main
container_name: webui
ports:
- 7000:8080/tcp
volumes:
- open-webui:/app/backend/data
extra_hosts:
- "host.docker.internal:host-gateway"
depends_on:
- ollama
restart: unless-stopped
ollama:
image: ollama/ollama
container_name: ollama
expose:
- 11434/tcp
ports:
- 11434:11434/tcp
healthcheck:
test: ollama --version || exit 1
volumes:
- ollama:/root/.ollama
restart: unless-stopped
volumes:
ollama:
open-webui:
Click “Deploy the Stack” and wait for the installation and startup of the services. This setup will install two containers:
- Ollama – A backend service that runs and manages large language models (LLMs) locally.
- WebUI – A user-friendly web interface that allows you to interact with Ollama easily.
Once both containers are installed and running, you can access the web interface by opening:
http://host_name_or_ip:7000 in your browser.
Using WebUI
On the WebUI page, you will first be prompted to create an administrator account. You can enter any credentials you like—these will not be transmitted anywhere and will be stored locally on your server.
Downloading a DeepSeek Model
- Go to the Workspace tab. Click on the drop-down arrow next to “Select a model”.
- Type in the search:
- “deepseek-r1” for the 7b model or
- “deepseek-r1:1.5b” for the 1.5b model.
- Click “Pull ‘deepseek-r1’ from Ollama.com” to download the selected model.


That’s it! Once the model is downloaded, you can start using your local DeepSeek AI chat!
You can choose from different DeepSeek model versions depending on your available hardware. Here are some examples:
- deepseek-r1:1.5b – The smallest distilled model, requiring 1.2GB of disk space and 1.5GB of free RAM to run.
- deepseek-r1:7b – A larger distilled model that needs 4.7GB of disk space and 5.5GB of RAM.
- deepseek-r1:671b – The most advanced full model currently available, requiring 405GB of disk space and about the same amount of RAM.
Check the full list of DeepSeek models here: DeepSeek models on Ollama.
You can download multiple models and switch between them just like in the Chat GPT web interface.
Additionally, feel free to explore other non-DeepSeek models available in the Ollama library.
Performance Comparison
I tested the first two DeepSeek models on my two home server setups:
- Raspberry Pi 4 (8GB DDR4)
- Intel N100 Mini PC (32GB DDR5) (detailed review of this mini PC)
I’m not an expert in benchmarking LLMs, so for simplicity, I compared the performance of the 1.5b and 7b models on both of my home servers by asking the AI the same question:
“Summarize Einstein’s theory of relativity in simple terms.”
I then measured the time it took for each model to generate a response. Here are the results I got:
Model | Raspberry Pi 4 (8GB DDR4) | Intel N100 Mini PC (32GB DDR5) |
---|---|---|
DeepSeek 1.5b | 6m 43s | 1m 30s |
DeepSeek 7b | 26m 20s | 3m 54s |
As expected, the Raspberry Pi 4 struggles with performance, while the Intel N100 Mini PC, combined with fast DDR5 memory, delivers significantly faster results. You can compare response quality and additional details in the screenshots below.




What’s Next?
- Using a Self-Hosted LLM on Mobile
For convenient access to your self-hosted LLM on a mobile device, you can use a dedicated app. I personally found this one to be quite useful. Simply add your server address in the app like this:http://host_name_or_ip:11434
- Securing Remote Access
If you want to access your LLM server from outside your home network, do not expose the port directly without first setting up authentication. By default, Ollama does not require authentication, which is a security risk. Ensure you configure proper access control before opening the server to the internet. - Leveraging GPU Acceleration
If your home server has an NVIDIA GPU, consider exploring how to enable GPU acceleration for improved processing speed and performance.
Stay Updated
If you liked this guide, follow me on social media (links are in the website footer) to show your support and stay updated on similar articles in the future.
Works great, to make it work with a gpu, I just added this to the ollama component definition:
In case of a trouble you can check this page: https://docs.docker.com/compose/how-tos/gpu-support/
thanks for this tutorial. I found this project on GitHub that does the same, but with intel gpu support. I have an N100 too and after deploying this I can indeed see the gpu being utilised by ollama with intel_gpu_top. I need to up the allocated VRAM in Bios to get it faster (currently I’m remote). Thought people might be interested. https://github.com/mattcurf/ollama-intel-gpu?tab=readme-ov-file