Setup Ollama on a specific isolated (air-gapped) server.
Let's run Ollama Container on a specific isolated (air-gapped) server.
In this post, I’ll focus on the third scenario.
- (Case 1) Let’s pull the Ollama image and run the container on a specific server.
- (Case 2) Automatically load a pre-specified LLM model when running the Ollama container on a specific server.
- (Case 3) Automatically load a pre-specified LLM model when running the Ollama container on a specific isolated (air-gapped) server.
Select base image on Docker Hub
The first step when building a Docker image is selecting the base image. Since our goal is to run Ollama, GPU support is required. So, let’s first check the CUDA version of the specific server.
For now, my PC has CUDA version 12.6, so let’s assume that the specific server also has CUDA 12.6 installed.
After searching on Docker Hub, the base image I selected is nvidia/cuda:12.6.3-cudnn-runtime-ubuntu22.04
. Here are the reasons for this choice:
nvidia/cuda:12.6.3-cudnn
: My PC uses CUDA 12.6runtime
:
runtime
includes only the libraries needed to run CUDA programs → smaller image size.
Since we’re only running Ollama, theruntime
version is more suitable!
devel
includes development tools such as compilers → use this when model training is required.buntu22.04
: The most stable Ubuntu version.
Make dockerfile
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
FROM nvidia/cuda:12.6.3-cudnn-runtime-ubuntu22.04
# Set non-interactive mode for apt-get
ENV DEBIAN_FRONTEND=noninteractive
# Install required dependencies
RUN apt-get update && apt-get install -y \
curl \
wget \
rsync \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
# Install Ollama
RUN curl -fsSL https://ollama.com/install.sh | sh
# Set up runtime for NVIDIA support
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility
# Note. Change Ollama Host. If you don't change, you don't connect ollama outside ollama container.
ENV OLLAMA_HOST=0.0.0.0:11434
# Copy entrypoint script
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
Make entrypoint file
In a Dockerfile, ENTRYPOINT
or CMD
can only be executed once when the container starts (Dockerfile Commands). If you want to run multiple commands, you need to create and execute a shell script.
In our case, we need to run two commands.
- ollama serve
- download llm model
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#!/bin/bash
ollama serve &
# Wait for the server to start
sleep 10
# List of models to download
echo "Check defined model..."
# MODELS=("gemma3:latest" "gemma3:12b")
MODELS=("gemma3:1b")
# MinIO server info
# Need to find a way to handle private buckets...!!!!
MINIO_URL="milvus-minio:9000" # 127.0.0.1:9000, localhost:9000 vs milvus-minio:9000
BUCKET="ds2man"
MODEL_PATH="ollama_models"
DEST_DIR="/root/.ollama/models"
# Temporary storage directory
TEMP_DIR="temp_models"
mkdir -p "$TEMP_DIR"
# Download, extract, and move models
for model in "${MODELS[@]}"; do
if ollama list | grep -q "$model"; then
echo "Model $model is already available. Skipping download."
continue
fi
# Convert model name (e.g., gemma3:latest -> gemma3-latest)
model_file=$(echo "$model" | tr ':' '-')
tar_file="$model_file.tar.gz"
# Download URL
url="$MINIO_URL/$BUCKET/$MODEL_PATH/$tar_file"
echo "Downloading $tar_file from $url"
wget -q "$url" -O "$TEMP_DIR/$tar_file"
if [[ $? -ne 0 ]]; then
echo "Failed to download $tar_file"
continue
fi
echo "Extracting $tar_file"
mkdir -p "$TEMP_DIR/models"
tar -xzvf "$TEMP_DIR/$tar_file" -C "$TEMP_DIR/models" --strip-components=1
echo "Moving extracted files to $DEST_DIR"
rsync -a "$TEMP_DIR/models/" "$DEST_DIR"
# Clean up
rm -rf "$TEMP_DIR/models"
rm "$TEMP_DIR/$tar_file"
done
# Remove temporary directory
rmdir "$TEMP_DIR"
echo "All models processed successfully."
# Keep the Ollama server running
wait -n
Build the Ollama Image
docker build -t ollama-gpu .
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
(base) jaoneol@DESKTOP-B7GM3C5:~/dcai$ docker build -t ollama-gpu .
[+] Building 56.9s (10/10) FINISHED docker:default
=> [internal] load build definition from dockerfile 0.0s
=> => transferring dockerfile: 2.42kB 0.0s
=> [internal] load metadata for docker.io/nvidia/cuda:12.6.3-cudnn-runtime-ubuntu22.04 0.7s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> CACHED [1/5] FROM docker.io/nvidia/cuda:12.6.3-cudnn-runtime-ubuntu22.04@sha256:5f0d2d827f6436b3cb7468fd8acbdc8c1d41261614e579ae49afe6141da51133 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 48B 0.0s
=> [2/5] RUN apt-get update && apt-get install -y curl wget rsync ca-certificates && rm -rf /var/lib/apt/lists/* 14.1s
=> [3/5] RUN curl -fsSL https://ollama.com/install.sh | sh 38.1s
=> [4/5] COPY entrypoint_imake-ollama.sh /entrypoint_imake-ollama.sh 0.0s
=> [5/5] RUN chmod +x /entrypoint_imake-ollama.sh 0.2s
=> exporting to image 3.5s
=> => exporting layers 3.5s
=> => writing image sha256:7b76567481ca3f3351ed2451ccf0c6d71d03c687c0330eebf8f5bd5d1e63e1f0 0.0s
=> => naming to docker.io/library/ollama-gpu 0.0s
(base) jaoneol@DESKTOP-B7GM3C5:~/dcai$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ollama-gpu latest 7b76567481ca 41 seconds ago 6.64GB
Run the Ollama Container
There’s one important thing to note. I’m running the MinIO container on my local PC. In order for the Ollama container to communicate with the MinIO container, they need to share the same Docker network. Since I already configured a Docker network when setting up MinIO, I’ll reuse that network. (Install MinIO)
docker run -d --gpus all --network milvus -v ollama_models:/root/.ollama/ -p 11434:11434 --name ollama-gpu ollama-gpu
1
2
3
4
5
6
7
(base) jaoneol@DESKTOP-B7GM3C5:~/dcai$ docker run -d --gpus all --network milvus -v ollama_models:/root/.ollama/ -p 11434:11434 --name ollama-gpu ollama-gpu
dd4b40438dd07ca86426dbc7c6643a561cbe2aea27751c3fcbbf2a525ebf316c
(base) jaoneol@DESKTOP-B7GM3C5:~/dcai$ docker ps -s
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES SIZE
89697f4cc367 ollama-gpu "/entrypoint_imake-o…" 25 minutes ago Up 25 minutes 0.0.0.0:11434->11434/tcp, :::11434->11434/tcp ollama-gpu 20.2kB (virtual 6.64GB)
After running the container, let’s check the following:
- Whether the GPU is being used
docker exec -it ollama-gpu nvidia-smi
- Whether
gemma3:1b
has been successfully pulled
docker exec -it ollama-gpu ollama list
- The Docker volume contents
docker volume inspect ollama_models
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
(base) jaoneol@DESKTOP-B7GM3C5:~/dcai$ docker exec -it ollama-gpu nvidia-smi
Fri Mar 21 22:46:31 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.02 Driver Version: 560.94 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4070 ... On | 00000000:01:00.0 On | N/A |
| 0% 39C P8 9W / 220W | 1928MiB / 12282MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 7 C /ollama N/A |
| 0 N/A N/A 8 C /milvus N/A |
| 0 N/A N/A 31 G /Xwayland N/A |
| 0 N/A N/A 73 C /ollama N/A |
+-----------------------------------------------------------------------------------------+
(base) jaoneol@DESKTOP-B7GM3C5:~/dcai$ docker exec -it ollama-gpu ollama list
NAME ID SIZE MODIFIED
gemma3:1b 2d27a774bc62 815 MB 5 minutes ago
(base) jaoneol@DESKTOP-B7GM3C5:~/dcai$ docker volume inspect ollama_models
{
"CreatedAt": "2025-03-22T07:33:30+09:00",
"Driver": "local",
"Labels": null,
"Mountpoint": "/var/lib/docker/volumes/ollama_models/_data",
"Name": "ollama_models",
"Options": null,
"Scope": "local"
}
]
(base) jaoneol@DESKTOP-B7GM3C5:~/dcai$
Test the Ollama Container
Ollama has a REST API for running and managing models.
You can easily test it out.
1
2
3
4
curl http://localhost:11434/api/generate -d '{
"model": "gemma3:1b",
"prompt":"What is the capital of America?"
}'
We successfully ran the Ollama container on a server operating in a closed network and loaded a pre-specified model from a MinIO instance within that network.
I hope this feature proves useful for others working on similar projects.