Post

LangChain using ChatOllama.

Let's write code to create a LangChain using ChatOllama.

LangChain using ChatOllama.

In the previous post, we implemented LangChain using Hugging Face transformers. However, using Ollama to build LangChain enables the implementation of most features in a way that is very similar to using ChatOpenAI. Ollama allows large models to be quantized, reducing memory usage and improving speed (related term: GGUF). More details on this will be shared in another post. In this session, we’ll create a LangChain using Ollama.

Preparation: Install Ollama

With Ollama, you can run open-source large language models like Llama3 locally. Preparation,

  1. Download and install Ollama on a supported platform (Mac/Linux/Windows).
    Ollama
    Ollama Discord
    Ollama model github
    Ollama model library
  2. Verify successful installation
    ollama
    ollama list
  3. Download models provided by Ollama (using the two examples below)
    ollama pull/run gemma2
    ollama pull/run llama3.2-vision

1. Steps to Build LangChain Using Ollama

  1. Load the Ollama model.
    • For reference, ChatOllama() and OllamaLLM() seem to have the same functionality, with almost no noticeable differences.
  2. Generate a prompt
  3. Create a LangChain
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
from langchain_ollama import ChatOllama,OllamaLLM
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts import PromptTemplate

# Step 1: Load the Ollama model.
llm = ChatOllama(model="gemma:7b")
# llm = OllamaLLM(model="gemma:7b")

# Step 2: Generate a prompt
template = """
#System:
You are a friendly AI assistant. Your name is DS2Man. Please answer questions briefly.
#Question:
{question}
#Answer:
"""
prompt = PromptTemplate.from_template(template)

# Step 3: Create a LangChain
# StrOutputParser() is one of the output_parsers classes that converts the model's output into structured information. I will explain it in detail in the OutputParser post.
chain = prompt | llm | StrOutputParser()

2. invoke

  1. invoke processes input data in a single instance and returns the response at once.
  2. When a user provides input, the model generates the entire result and then returns it in one go.
  3. The response is provided only after the model completes generating all the text.
1
2
3
4
5
question = "What is the capital of the United States?"
response = chain.invoke({"question": question})
print("Invoke Result:")
# Reference : Post(LangChain using My Own Custom class, ChatBrainAI.)
invoke_response(response)
1
2
Invoke Result:
 The capital of the United States is Washington, D.C.

3. stream

  1. stream is a method that partially returns results in real-time as the model generates text.
  2. You can observe the process of the model generating the response text in real-time.
1
2
3
4
5
6
7
from langchain_core.messages import AIMessageChunk

question = "What is the capital of the United States?"
response = chain.stream({"question": question})
print("Streamed Result:")
# Reference : Post(LangChain using My Own Custom class, ChatBrainAI.)
stream_response(response)
1
2
3
Streamed Result:
The type of chunk is str...
The capital of the United States is Washington, D.C.
This post is licensed under CC BY 4.0 by the author.