Post

Setup Ollama on a specific server.

Let's run Ollama Container on a specific server.

Setup Ollama on a specific server.

In this post, I’ll focus on the second scenario.

Select base image on Docker Hub

The first step when building a Docker image is selecting the base image. Since our goal is to run Ollama, GPU support is required. So, let’s first check the CUDA version of the specific server.
For now, my PC has CUDA version 12.6, so let’s assume that the specific server also has CUDA 12.6 installed.

After searching on Docker Hub, the base image I selected is nvidia/cuda:12.6.3-cudnn-runtime-ubuntu22.04. Here are the reasons for this choice:

  • nvidia/cuda:12.6.3-cudnn : My PC uses CUDA 12.6
  • runtime:
    runtime includes only the libraries needed to run CUDA programs → smaller image size.
    Since we’re only running Ollama, the runtime version is more suitable!
    devel includes development tools such as compilers → use this when model training is required.
  • buntu22.04: The most stable Ubuntu version.

Make dockerfile

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
FROM nvidia/cuda:12.6.3-cudnn-runtime-ubuntu22.04

# Set non-interactive mode for apt-get
ENV DEBIAN_FRONTEND=noninteractive

# Install required dependencies
RUN apt-get update && apt-get install -y \
    curl \
    wget \
    rsync \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*

# Install Ollama
RUN curl -fsSL https://ollama.com/install.sh | sh

# Set up runtime for NVIDIA support
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility

# Note. Change Ollama Host. If you don't change, you don't connect ollama outside ollama container.
ENV OLLAMA_HOST=0.0.0.0:11434

# Copy entrypoint script
COPY entrypoint_imake-ollama.sh /entrypoint_imake-ollama.sh
RUN chmod +x /entrypoint_imake-ollama.sh

ENTRYPOINT ["/entrypoint_imake-ollama.sh"]

Make entrypoint file

In a Dockerfile, ENTRYPOINT or CMD can only be executed once when the container starts (Dockerfile Commands). If you want to run multiple commands, you need to create and execute a shell script.
In our case, we need to run two commands.

  • ollama serve
  • download llm model
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/bin/bash

ollama serve &
# Wait for the server to start
sleep 10

# List of models to download
echo "Check defined model..."
# MODELS=("gemma3:latest" "gemma3:12b")
MODELS=("gemma3:1b")

# Loop through each model and download if not already present
for model in "${MODELS[@]}"; do
    if ! ollama list | grep -q "$model"; then
        echo "Downloading $model model..."
        ollama pull "$model"
    else
        echo "$model model is already downloaded."
    fi
done

# Keep the Ollama server running
wait -n

Build the Ollama Image

  • docker build -t ollama-gpu .
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
(base) jaoneol@DESKTOP-B7GM3C5:~/dcai/imake-ollama$ docker build -t ollama-gpu .
[+] Building 56.9s (10/10) FINISHED                                                                                                                                       docker:default
 => [internal] load build definition from dockerfile                                                                                                                                0.0s
 => => transferring dockerfile: 2.42kB                                                                                                                                              0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:12.6.3-cudnn-runtime-ubuntu22.04                                                                                             0.7s
 => [internal] load .dockerignore                                                                                                                                                   0.0s
 => => transferring context: 2B                                                                                                                                                     0.0s
 => CACHED [1/5] FROM docker.io/nvidia/cuda:12.6.3-cudnn-runtime-ubuntu22.04@sha256:5f0d2d827f6436b3cb7468fd8acbdc8c1d41261614e579ae49afe6141da51133                                0.0s
 => [internal] load build context                                                                                                                                                   0.0s
 => => transferring context: 48B                                                                                                                                                    0.0s
 => [2/5] RUN apt-get update && apt-get install -y     curl     wget     rsync     ca-certificates     && rm -rf /var/lib/apt/lists/*                                              14.1s
 => [3/5] RUN curl -fsSL https://ollama.com/install.sh | sh                                                                                                                        38.1s
 => [4/5] COPY entrypoint_imake-ollama.sh /entrypoint_imake-ollama.sh                                                                                                               0.0s
 => [5/5] RUN chmod +x /entrypoint_imake-ollama.sh                                                                                                                                  0.2s
 => exporting to image                                                                                                                                                              3.5s
 => => exporting layers                                                                                                                                                             3.5s
 => => writing image sha256:7b76567481ca3f3351ed2451ccf0c6d71d03c687c0330eebf8f5bd5d1e63e1f0                                                                                        0.0s
 => => naming to docker.io/library/ollama-gpu                                                                                                                                       0.0s
(base) jaoneol@DESKTOP-B7GM3C5:~/dcai/imake-ollama$ docker images
REPOSITORY            TAG                            IMAGE ID       CREATED          SIZE
ollama-gpu            latest                         7b76567481ca   41 seconds ago   6.64GB

Run the Ollama Container

  • docker run -d --gpus all -v ollama_models:/root/.ollama/ -p 11434:11434 --name ollama-gpu ollama-gpu
1
2
3
4
5
6
(base) jaoneol@DESKTOP-B7GM3C5:~/dcai/imake-ollama$ docker run -d --gpus all -v ollama_models:/root/.ollama/ -p 11434:11434 --name ollama-gpu ollama-gpu
dd4b40438dd07ca86426dbc7c6643a561cbe2aea27751c3fcbbf2a525ebf316c

(base) jaoneol@DESKTOP-B7GM3C5:~/dcai/test-ollama$ docker ps -s
CONTAINER ID   IMAGE                                      COMMAND                  CREATED          STATUS                  PORTS                                                                                      NAMES               SIZE
89697f4cc367   ollama-gpu                                 "/entrypoint_imake-o…"   25 minutes ago   Up 25 minutes           0.0.0.0:11434->11434/tcp, :::11434->11434/tcp                                              ollama-gpu          20.2kB (virtual 6.64GB)

After running the container, let’s check the following:

  • Whether the GPU is being used
    docker exec -it ollama-gpu nvidia-smi
  • Whether gemma3:1b has been successfully pulled
    docker exec -it ollama-gpu ollama list
  • The Docker volume contents
    docker volume inspect ollama_models
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
(base) jaoneol@DESKTOP-B7GM3C5:~/dcai/imake-ollama$ docker exec -it ollama-gpu nvidia-smi
Fri Mar 21 22:46:31 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.02              Driver Version: 560.94         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 ...    On  |   00000000:01:00.0  On |                  N/A |
|  0%   39C    P8              9W /  220W |    1928MiB /  12282MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
  
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A         7      C   /ollama                                     N/A      |
|    0   N/A  N/A         8      C   /milvus                                     N/A      |
|    0   N/A  N/A        31      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        73      C   /ollama                                     N/A      |
+-----------------------------------------------------------------------------------------+

(base) jaoneol@DESKTOP-B7GM3C5:~/dcai/imake-ollama$ docker exec -it ollama-gpu ollama list
NAME         ID              SIZE      MODIFIED
gemma3:1b    2d27a774bc62    815 MB    5 minutes ago

(base) jaoneol@DESKTOP-B7GM3C5:~/dcai/imake-ollama$ docker volume inspect ollama_models
[
    {
        "CreatedAt": "2025-03-22T07:33:30+09:00",
        "Driver": "local",
        "Labels": null,
        "Mountpoint": "/var/lib/docker/volumes/ollama_models/_data",
        "Name": "ollama_models",
        "Options": null,
        "Scope": "local"
    }
]

(base) jaoneol@DESKTOP-B7GM3C5:~/dcai/imake-ollama$

Test the Ollama Container

Ollama has a REST API for running and managing models.
You can easily test it out.

1
2
3
4
curl http://localhost:11434/api/generate -d '{
  "model": "gemma3:1b",
  "prompt":"What is the capital of America?"
}'

We’ve confirmed that the pre-specified model is successfully loaded when running on a server. However, if the server is operating on a specific isolated (air-gapped) server, it won’t be able to fetch the model using ollama pull.
In the next post, we’ll show how to load the pre-specified model from a MinIO instance that’s been set up on a specific isolated (air-gapped) server.

This post is licensed under CC BY 4.0 by the author.