Comparison Between Models.

Let's compare between models registered in Ollama

Posted Feb 11, 2025

By DS2Man

1 min read

I plan to build all projects using ChatOllama(). So, It would be useful to note the VRAM usage of the models registered in Ollama. In this post, I am writing to keep a record of the usage history when using different models. This will be continuously updated, so please refer to it.

Comparison Between Models

Attribute	Model	Param	Size (GB)	VRAM Usage (GB)	agent tools	vison	thinking	architecture	quantization
General	Llama-3.2-3B-Instruct	-	-	9.6	-	-	-	-	-
General	gemma-2-2b-it	-	-	9.8	-	-	-	-	-
General	polyglot-ko-1.3b	-	-	6.1	-	-	-	-	-
Ollama	llama3.1:latest	8b	4.9	6.0	●	-	-	llama	Q4_K_M
Ollama	llama3.2:latest	3b	2.0	3.2	●	-	-	llama	Q4_K_M
Ollama	llama3.2-vision:latest	11b	6.0	10.0	-	●	-	llama	Q4_K_M
Ollama	gemma2:latest	9b	5.4	8.4	-	-	-	gemma2	Q4_0
Ollama	gemma3:latest	4b	3.3	4.5	-	●	-	gemma3	Q4_K_M
Ollama	PetrosStav/gemma3-tools:12b	12b	8.1	7.0	-	●	-	gemma3	Q4_K_M
Ollama		12b	8.1	7.0	-	●	-	gemma3	Q4_K_M
Ollama	deepseek-r1:14b	14b	9.0	8.9	●	-	●	qwen2	Q4_K_M
Ollama	deepseek-r1:7b	7b	4.7	5.2	●	-	●	qwen2	Q4_K_M
Ollama	qwen3:latest	8b	5.2	6.4	●	-	●	qwen3	Q4_K_M
Ollama	qwen3:14b	14b	9.3	10	●	-	●	qwen3	Q4_K_M

You can check key information about a model using the ollama show command. The quantization details are especially important.
As the use of Large Language Models (LLMs) rapidly increases, model compression and optimization have become critical. In particular, in edge environments with limited server resources or services that require real-time responses, it’s often not feasible to use large-scale models as-is. In such cases, quantization is a technique that can significantly reduce both inference speed and memory usage, while preserving as much model accuracy as possible. I’ll update this post with further insights after trying it out myself.

PS C:\Users\ycjang> ollama show gemma3:12b
  Model
    architecture        gemma3
    parameters          12.2B
    context length      8192
    embedding length    3840
    quantization        Q4_K_M

  Parameters
    stop           "<end_of_turn>"
    temperature    0.1

  License
    Gemma Terms of Use
    Last modified: February 21, 2024

PS C:\Users\ycjang>

LLM&RAG, L&R-Ollama

Ollama

This post is licensed under CC BY 4.0 by the author.

Comparison Between Models

Trending Tags