Setup Ollama on a specific isolated (air-gapped) server.

Let's run Ollama Container on a specific isolated (air-gapped) server.

Posted Feb 16, 2025

By DS2Man

15 min read

In this post, I’ll focus on the third scenario.

(Case 1) Let’s pull the Ollama image and run the container on a specific server.
(Case 2) Automatically load a pre-specified LLM model when running the Ollama container on a specific server.
(Case 3) Automatically load a pre-specified LLM model when running the Ollama container on a specific isolated (air-gapped) server.

Select base image on Docker Hub

The first step when building a Docker image is selecting the base image. Since our goal is to run Ollama, GPU support is required. So, let’s first check the CUDA version of the specific server.
For now, my PC has CUDA version 12.6, so let’s assume that the specific server also has CUDA 12.6 installed.

After searching on Docker Hub, the base image I selected is nvidia/cuda:12.6.3-cudnn-runtime-ubuntu22.04. Here are the reasons for this choice:

nvidia/cuda:12.6.3-cudnn : My PC uses CUDA 12.6
runtime:
runtime includes only the libraries needed to run CUDA programs → smaller image size.
Since we’re only running Ollama, the runtime version is more suitable!
devel includes development tools such as compilers → use this when model training is required.
buntu22.04: The most stable Ubuntu version.

Make dockerfile

  
FROM nvidia/cuda:12.6.3-cudnn-runtime-ubuntu22.04

# Set non-interactive mode for apt-get
ENV DEBIAN_FRONTEND=noninteractive

# Install required dependencies
RUN apt-get update && apt-get install -y \
    curl \
    wget \
    rsync \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*

# Install Ollama
RUN curl -fsSL https://ollama.com/install.sh | sh

# Set up runtime for NVIDIA support
ENV NVIDIA_VISIBLE_DEVICES=all
ENV NVIDIA_DRIVER_CAPABILITIES=compute,utility

# Note. Change Ollama Host. If you don't change, you don't connect ollama outside ollama container.
ENV OLLAMA_HOST=0.0.0.0:11434

# Copy entrypoint script
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

ENTRYPOINT ["/entrypoint.sh"]

Make entrypoint file

In a Dockerfile, ENTRYPOINT or CMD can only be executed once when the container starts (Dockerfile Commands). If you want to run multiple commands, you need to create and execute a shell script.
In our case, we need to run two commands.

ollama serve
download llm model

  
#!/bin/bash

ollama serve &

# Wait for the server to start
sleep 10

# List of models to download
echo "Check defined model..."
# MODELS=("gemma3:latest" "gemma3:12b")
MODELS=("gemma3:1b")

# MinIO server info
# Need to find a way to handle private buckets...!!!!
MINIO_URL="milvus-minio:9000"  # 127.0.0.1:9000, localhost:9000 vs milvus-minio:9000
BUCKET="ds2man"
MODEL_PATH="ollama_models"
DEST_DIR="/root/.ollama/models"

# Temporary storage directory
TEMP_DIR="temp_models"
mkdir -p "$TEMP_DIR"

# Download, extract, and move models
for model in "${MODELS[@]}"; do
    if ollama list | grep -q "$model"; then
        echo "Model $model is already available. Skipping download."
        continue
    fi

    # Convert model name (e.g., gemma3:latest -> gemma3-latest)
    model_file=$(echo "$model" | tr ':' '-')
    tar_file="$model_file.tar.gz"

    # Download URL
    url="$MINIO_URL/$BUCKET/$MODEL_PATH/$tar_file"

    echo "Downloading $tar_file from $url"
    wget -q "$url" -O "$TEMP_DIR/$tar_file"

    if [[ $? -ne 0 ]]; then
        echo "Failed to download $tar_file"
        continue
    fi

    echo "Extracting $tar_file"
    mkdir -p "$TEMP_DIR/models"
    tar -xzvf "$TEMP_DIR/$tar_file" -C "$TEMP_DIR/models" --strip-components=1

    echo "Moving extracted files to $DEST_DIR"
    rsync -a "$TEMP_DIR/models/" "$DEST_DIR"

    # Clean up
    rm -rf "$TEMP_DIR/models"
    rm "$TEMP_DIR/$tar_file"
done

# Remove temporary directory
rmdir "$TEMP_DIR"  

echo "All models processed successfully." 

# Keep the Ollama server running
wait -n

Build the Ollama Image

docker build -t ollama-gpu .

  
(base) jaoneol@DESKTOP-B7GM3C5:~/dcai$ docker build -t ollama-gpu .
[+] Building 56.9s (10/10) FINISHED                                                                                                                                       docker:default
 => [internal] load build definition from dockerfile                                                                                                                                0.0s
 => => transferring dockerfile: 2.42kB                                                                                                                                              0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:12.6.3-cudnn-runtime-ubuntu22.04                                                                                             0.7s
 => [internal] load .dockerignore                                                                                                                                                   0.0s
 => => transferring context: 2B                                                                                                                                                     0.0s
 => CACHED [1/5] FROM docker.io/nvidia/cuda:12.6.3-cudnn-runtime-ubuntu22.04@sha256:5f0d2d827f6436b3cb7468fd8acbdc8c1d41261614e579ae49afe6141da51133                                0.0s
 => [internal] load build context                                                                                                                                                   0.0s
 => => transferring context: 48B                                                                                                                                                    0.0s
 => [2/5] RUN apt-get update && apt-get install -y     curl     wget     rsync     ca-certificates     && rm -rf /var/lib/apt/lists/*                                              14.1s
 => [3/5] RUN curl -fsSL https://ollama.com/install.sh | sh                                                                                                                        38.1s
 => [4/5] COPY entrypoint_imake-ollama.sh /entrypoint_imake-ollama.sh                                                                                                               0.0s
 => [5/5] RUN chmod +x /entrypoint_imake-ollama.sh                                                                                                                                  0.2s
 => exporting to image                                                                                                                                                              3.5s
 => => exporting layers                                                                                                                                                             3.5s
 => => writing image sha256:7b76567481ca3f3351ed2451ccf0c6d71d03c687c0330eebf8f5bd5d1e63e1f0                                                                                        0.0s
 => => naming to docker.io/library/ollama-gpu                                                                                                                                       0.0s

(base) jaoneol@DESKTOP-B7GM3C5:~/dcai$ docker images
REPOSITORY            TAG                            IMAGE ID       CREATED          SIZE
ollama-gpu            latest                         7b76567481ca   41 seconds ago   6.64GB

Run the Ollama Container

There’s one important thing to note. I’m running the MinIO container on my local PC. In order for the Ollama container to communicate with the MinIO container, they need to share the same Docker network. Since I already configured a Docker network when setting up MinIO, I’ll reuse that network. (Install MinIO)

docker run -d --gpus all --network milvus -v ollama_models:/root/.ollama/ -p 11434:11434 --name ollama-gpu ollama-gpu

  
(base) jaoneol@DESKTOP-B7GM3C5:~/dcai$ docker run -d --gpus all --network milvus -v ollama_models:/root/.ollama/ -p 11434:11434 --name ollama-gpu ollama-gpu
dd4b40438dd07ca86426dbc7c6643a561cbe2aea27751c3fcbbf2a525ebf316c

(base) jaoneol@DESKTOP-B7GM3C5:~/dcai$ docker ps -s

CONTAINER ID   IMAGE                                      COMMAND                  CREATED          STATUS                  PORTS                                                                                      NAMES               SIZE
89697f4cc367   ollama-gpu                                 "/entrypoint_imake-o…"   25 minutes ago   Up 25 minutes           0.0.0.0:11434->11434/tcp, :::11434->11434/tcp                                              ollama-gpu          20.2kB (virtual 6.64GB)

After running the container, let’s check the following:

Whether the GPU is being used
docker exec -it ollama-gpu nvidia-smi
Whether gemma3:1b has been successfully pulled
docker exec -it ollama-gpu ollama list
The Docker volume contents
docker volume inspect ollama_models

  
(base) jaoneol@DESKTOP-B7GM3C5:~/dcai$ docker exec -it ollama-gpu nvidia-smi
Fri Mar 21 22:46:31 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.02              Driver Version: 560.94         CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 ...    On  |   00000000:01:00.0  On |                  N/A |
|  0%   39C    P8              9W /  220W |    1928MiB /  12282MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A         7      C   /ollama                                     N/A      |
|    0   N/A  N/A         8      C   /milvus                                     N/A      |
|    0   N/A  N/A        31      G   /Xwayland                                   N/A      |
|    0   N/A  N/A        73      C   /ollama                                     N/A      |
+-----------------------------------------------------------------------------------------+

(base) jaoneol@DESKTOP-B7GM3C5:~/dcai$ docker exec -it ollama-gpu ollama list
NAME         ID              SIZE      MODIFIED
gemma3:1b    2d27a774bc62    815 MB    5 minutes ago

(base) jaoneol@DESKTOP-B7GM3C5:~/dcai$ docker volume inspect ollama_models
    {
        "CreatedAt": "2025-03-22T07:33:30+09:00",
        "Driver": "local",
        "Labels": null,
        "Mountpoint": "/var/lib/docker/volumes/ollama_models/_data",
        "Name": "ollama_models",
        "Options": null,
        "Scope": "local"
    }
]

(base) jaoneol@DESKTOP-B7GM3C5:~/dcai$

Test the Ollama Container

Ollama has a REST API for running and managing models.
You can easily test it out.

curl http://localhost:11434/api/generate -d '{
  "model": "gemma3:1b",
  "prompt":"What is the capital of America?"
}'

We successfully ran the Ollama container on a server operating in a closed network and loaded a pre-specified model from a MinIO instance within that network.
I hope this feature proves useful for others working on similar projects.

LLM&RAG, L&R-Ollama

Ollama

This post is licensed under CC BY 4.0 by the author.