In this post, we will explore various methods for querying MilvusDB. For more details, please refer to the official Milvus website. I will primarily focus on the representative search methods that I have personally used at least once. I’ll update this content once I learn more.
Search
Basic ANN Search
Based on an index file recording the sorted order of vector embeddings, the Approximate Nearest Neighbor (ANN) search locates a subset of vector embeddings based on the query vector carried in a received search request, compares the query vector with those in the subgroup, and returns the most similar results. With ANN search, Milvus provides an efficient search experience. The most representative method is Single-Vector Search.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
| # Connect to MilvusClient
milvusClient = MilvusClient(uri=MILVUS_URL)
milvusClient.using_database(MILVUS_DB_NAME)
# Embed the query.
question = "US Open tennis tournament."
embedding_model = SentenceTransformer(f"./{MODEL_FOLDER}/{EMBED_MODEL}")
vector_embedding = embedding_model.encode([question])
# Retrieve all field names of the collection.
collection_info = milvusClient.describe_collection(MILVUS_COL_NAME)
fields = [field["name"] for field in collection_info["fields"]]
PROCESSMAP_COSINE_REF = 0.6
# Basic ANN Search
vector_res = milvusClient.search(
collection_name=MILVUS_COL_NAME,
data=vector_embedding, # 1.Basic ANN Search
output_fields=fields,
limit=4,
# search_params={"metric_type": "COSINE", "params": {"radius": PROCESSMAP_COSINE_REF}}, # Note. metric_type and radius
)
for hits in vector_res:
for hit in hits:
print(f"Distance: {hit['distance']}, NewsPaper: {hit['entity']['NewsPaper']}, News: {hit['entity']['News']}")
|
1
2
3
4
| Distance: 0.6732997894287109, NewsPaper: Yonhap News, News: Jannik Sinner (World No. 1, Italy) clinched the men's singles title at the US Open tennis tournament.
Distance: 0.6732997894287109, NewsPaper: MBC, News: Jannik Sinner (World No. 1, Italy) clinched the men's singles title at the US Open tennis tournament.
Distance: 0.5168114900588989, NewsPaper: Chosun Ilbo, News: Jannik Sinner dominated men's tennis in 2024, securing the ATP Finals title.
Distance: 0.4742506444454193, NewsPaper: Dong-A Ilbo, News: Novak Djokovic and Carlos Alcaraz faced off in the men's singles final at the 2024 Paris Olympics.
|
Filtered Search
An ANN search finds vector embeddings most similar to specified vector embeddings. However, the search results may not always be correct. You can include filtering conditions in a search request so that Milvus conducts metadata filtering before conducting ANN searches, reducing the search scope from the whole collection to only the entities matching the specified filtering conditions. There are two methods: Standard Filtering
and iterative filtering
. Filtered Search
Standard Filtering and iterative filtering
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
| # Connect to MilvusClient
milvusClient = MilvusClient(uri=MILVUS_URL)
milvusClient.using_database(MILVUS_DB_NAME)
# Embed the query.
question = "US Open tennis tournament."
embedding_model = SentenceTransformer(f"./{MODEL_FOLDER}/{EMBED_MODEL}")
vector_embedding = embedding_model.encode([question])
# Retrieve all field names of the collection.
collection_info = milvusClient.describe_collection(MILVUS_COL_NAME)
fields = [field["name"] for field in collection_info["fields"]]
# Filtered Search
query_filter = "Topic like \'%Victory%\'"
# query_filter = "Topic like \'%Victory%\' and NewsPaper==\'MBC\'"
vector_res = milvusClient.search(
collection_name=MILVUS_COL_NAME,
data=vector_embedding, # 1.Basic ANN Search
output_fields=fields,
limit=4,
filter=query_filter, # 2.Filter Search(Standard Filtering)
search_params={ "hints": "iterative_filter" } # 2.Filter Search(iteraive filtering)
)
for hits in vector_res:
for hit in hits:
print(f"Distance: {hit['distance']}, Topic: {hit['entity']['Topic']}, NewsPaper: {hit['entity']['NewsPaper']}, News: {hit['entity']['News']}")
|
1
2
3
| Distance: 0.6732997894287109, Topic: US Open Victory, NewsPaper: Yonhap News, News: Jannik Sinner (World No. 1, Italy) clinched the men's singles title at the US Open tennis tournament.
Distance: 0.6732997894287109, Topic: 2024 US Open Victory, NewsPaper: MBC, News: Jannik Sinner (World No. 1, Italy) clinched the men's singles title at the US Open tennis tournament.
Distance: 0.5168114900588989, Topic: ATP Finals Victory, NewsPaper: Chosun Ilbo, News: Jannik Sinner dominated men's tennis in 2024, securing the ATP Finals title.
|
Range Search
A range search improves search result relevancy by restricting the distance or score of the returned entities within a specific range. This page helps you understand what range search is and the procedures to conduct a range search. Range Search
Range Search
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
| # Connect to MilvusClient
milvusClient = MilvusClient(uri=MILVUS_URL)
milvusClient.using_database(MILVUS_DB_NAME)
# Embed the query.
question = "US Open tennis tournament."
embedding_model = SentenceTransformer(f"./{MODEL_FOLDER}/{EMBED_MODEL}")
vector_embedding = embedding_model.encode([question])
# Retrieve all field names of the collection.
collection_info = milvusClient.describe_collection(MILVUS_COL_NAME)
fields = [field["name"] for field in collection_info["fields"]]
# Range Search
query_filter = "Topic like \'%Victory%\'"
# query_filter = "Topic like \'%Victory%\' and NewsPaper==\'MBC\'"
PROCESSMAP_COSINE_REF = 0.6
vector_res = milvusClient.search(
collection_name=MILVUS_COL_NAME,
data=vector_embedding, # 1.Basic ANN Search
output_fields=fields,
limit=4,
filter=query_filter, # 2.Filter Search
search_params={"metric_type": "COSINE", "params": {"radius": PROCESSMAP_COSINE_REF}}, # 3.Range Search
)
for hits in vector_res:
for hit in hits:
print(f"Distance: {hit['distance']}, Topic: {hit['entity']['Topic']}, NewsPaper: {hit['entity']['NewsPaper']}, News: {hit['entity']['News']}")
|
1
2
| Distance: 0.6732997894287109, Topic: US Open Victory, NewsPaper: Yonhap News, News: Jannik Sinner (World No. 1, Italy) clinched the men's singles title at the US Open tennis tournament.
Distance: 0.6732997894287109, Topic: 2024 US Open Victory, NewsPaper: MBC, News: Jannik Sinner (World No. 1, Italy) clinched the men's singles title at the US Open tennis tournament.
|
Delete Entities
It’s very simple. You just need to change the previously learned search to delete. Delete
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
| # Connect to MilvusClient
milvusClient = MilvusClient(uri=MILVUS_URL)
milvusClient.using_database(MILVUS_DB_NAME)
# Embed the query.
question = "US Open tennis tournament."
embedding_model = SentenceTransformer(f"./{MODEL_FOLDER}/{EMBED_MODEL}")
vector_embedding = embedding_model.encode([question])
# Retrieve all field names of the collection.
collection_info = milvusClient.describe_collection(MILVUS_COL_NAME)
fields = [field["name"] for field in collection_info["fields"]]
# Range Search
query_filter = "Topic like \'%Victory%\'"
# query_filter = "Topic like \'%Victory%\' and NewsPaper==\'MBC\'"
PROCESSMAP_COSINE_REF = 0.6
vector_res = milvusClient.delete( # search -> delete
collection_name=MILVUS_COL_NAME,
data=vector_embedding, # 1.Basic ANN Search
output_fields=fields,
limit=4,
filter=query_filter, # 2.Filter Search
search_params={"metric_type": "COSINE", "params": {"radius": PROCESSMAP_COSINE_REF}}, # 3.Range Search
)
print(vector_res)
|
1
| {'delete_count': 3, 'cost': 0}
|