LangChain using ChatOllama.

Let's write code to create a LangChain using ChatOllama.

Posted Feb 7, 2025

By DS2Man

2 min read

In the previous post, we implemented LangChain using Hugging Face transformers. However, using Ollama to build LangChain enables the implementation of most features in a way that is very similar to using ChatOpenAI. Ollama allows large models to be quantized, reducing memory usage and improving speed (related term: GGUF). More details on this will be shared in another post. In this session, we’ll create a LangChain using Ollama.

Preparation: Install Ollama

With Ollama, you can run open-source large language models like Llama3 locally. Preparation,

Download and install Ollama on a supported platform (Mac/Linux/Windows).
Ollama
Ollama Discord
Ollama model github
Ollama model library
Verify successful installation
ollama
ollama list
Download models provided by Ollama (using the two examples below)
ollama pull/run gemma2
ollama pull/run llama3.2-vision

1. Steps to Build LangChain Using Ollama

Load the Ollama model.
- For reference, ChatOllama() and OllamaLLM() seem to have the same functionality, with almost no noticeable differences.
Generate a prompt
Create a LangChain

  
from langchain_ollama import ChatOllama,OllamaLLM
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts import PromptTemplate

# Step 1: Load the Ollama model.
llm = ChatOllama(model="gemma:7b")
# llm = OllamaLLM(model="gemma:7b")

# Step 2: Generate a prompt
template = """
#System:
You are a friendly AI assistant. Your name is DS2Man. Please answer questions briefly.
#Question:
{question}
#Answer:
"""
prompt = PromptTemplate.from_template(template)

# Step 3: Create a LangChain
# StrOutputParser() is one of the output_parsers classes that converts the model's output into structured information. I will explain it in detail in the OutputParser post.
chain = prompt | llm | StrOutputParser()

2. invoke

invoke processes input data in a single instance and returns the response at once.
When a user provides input, the model generates the entire result and then returns it in one go.
The response is provided only after the model completes generating all the text.

  
question = "What is the capital of the United States?"
response = chain.invoke({"question": question})
print("Invoke Result:")
# Reference : Post(LangChain using My Own Custom class, ChatBrainAI.)
invoke_response(response)

Invoke Result:
 The capital of the United States is Washington, D.C.

3. stream

stream is a method that partially returns results in real-time as the model generates text.
You can observe the process of the model generating the response text in real-time.

  
from langchain_core.messages import AIMessageChunk

question = "What is the capital of the United States?"
response = chain.stream({"question": question})
print("Streamed Result:")
# Reference : Post(LangChain using My Own Custom class, ChatBrainAI.)
stream_response(response)

Streamed Result:
The type of chunk is str...
The capital of the United States is Washington, D.C.

LLM&RAG, L&R-Understanding

This post is licensed under CC BY 4.0 by the author.