Multimodal Retrieval Augmented Generation (RAG) with Milvus

1 | © Copyright 2024 Zilliz
1
Multimodal RAG with Milvus
Yi Wang @ Zilliz

2
01 RAG is the New Search
CONTENTS
02 Multimodal Retrieval with Milvus

3
RAG is the New Search

4
Retrieval-Augmented Generation

5
A Typical Search System
Picture Credit: https://web.eecs.umich.edu/~nham/EECS398F19/

6
Indexing
Query Retrieval Prompt&
Generation
Recap of RAG Architecture

7
Indexing
Generation
Offline Indexing

8
Indexing
Generation
Online Serving

9
How RAG Resembles Search

10
Multimodal Retrieval with Milvus

11
Multi-modal Retrieval
● Combining text and
image in the search
query
● Retrieving
multi-modal content
for generation
Query = "feuilles brunes pendant la journée"
(i.e. "brown leaves during daytime")

12

13
Easy to start with, can even run on edge devices!

14
Scale-up on Docker

15
Up to 100 billion vectors with K8s!

16

17
Data Preparation
󰗓 Download images.zip file directly from:
https://huggingface.co/datasets/unum-cloud/ann-unsplash-25k/tree/main
import glob, time, pprint
import numpy as np
from PIL import Image
import pandas as pd
# Load image files and descriptions
image_data = pd.read_csv('images.csv')
print(image_data.shape)
display(image_data.head(2))
# List of image urls and texts.
image_urls = list(image_data.photo_id)
image_texts = list(image_data.ai_description)

18
Create a Milvus Collection
# STEP 1. Connect to milvus
connection = connections.connect(
alias="default",
host='localhost', # or '0.0.0.0' or 'localhost'
port='19530'
)
# STEP 2. Create a new collection and build index
EMBEDDING_DIM = 256
MAX_LENGTH = 65535
# Step 2.1 Define the data schema for the new Collection.
fields = [
# Use auto generated id as primary key
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True, max_length=100),
FieldSchema(name="text_vector", dtype=DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM),
FieldSchema(name="image_vector", dtype=DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM),
FieldSchema(name="chunk", dtype=DataType.VARCHAR, max_length=MAX_LENGTH),
FieldSchema(name="image_filepath", dtype=DataType.VARCHAR, max_length=MAX_LENGTH),
]
schema = CollectionSchema(fields, "")
# Step 2.2 create collection
col = Collection(“Demo_multimodal”, schema)
# Step 2.3 Build index for both vector columns .
image_index = {"metric_type": "COSINE"}
col.create_index("image_vector", image_index)
text_index = {"metric_type": "COSINE"}
col.create_index("text_vector", text_index)
col.load()

19
Data Vectorization & insertion
# STEP 4. Insert data into milvus OR zilliz.
# Prepare data batch.
chunk_dict_list = []
for chunk, img_url, img_embed, text_embed in zip(
batch_texts,
batch_urls,
image_embeddings, text_embeddings):
# Assemble embedding vector, original text chunk, metadata.
chunk_dict = {
'chunk': chunk,
'image_filepath': img_url,
'text_vector': text_embed,
'image_vector': img_embed
}
chunk_dict_list.append(chunk_dict)
# Actually insert data batch.
# If the data size is large, try bulk_insert()
col.insert(data=chunk_dict_list)
# STEP 3. Data vectorization(i.e. embedding).
image_embeddings, text_embeddings = embedding_model(
batch_images=batch_images,
batch_texts=batch_texts)

20
# STEP 4. hybrid_search() is the API for multimodal search
results = col.hybrid_search(
reqs=[image_req, text_req],
rerank=RRFRanker(),
limit=top_k,
output_fields=output_fields)
Final step: Search

21
[Multimodal] search with text-only query
Query = "feuilles brunes pendant la journée"
(i.e. "brown leaves during daytime")

22
[Multimodal] search with image-only query
[Query is an image]

23
[Multimodal] search with text + image query
Query = text + image
1. "silhouette d'une personne assise sur une
roche au couche du soleil"
(i.e. "silhouette of person sitting on rock formation
during golden hour")
2. Image below
Result

24
QA

26 | © Copyright 9/25/23 Zilliz
26
curl --request POST
--url “${MILVUS_HOST}:${MILVUS_PORT}/v2/vectordb/entities/advanced_search”
--header “Authorization: Bearer ${TOKEN}”
--header “accept: application/json”
--header “content-type: application/json”
-d
{
"collectionName": "book",
"search": {
"search_by": {
"ﬁeld": "book_intro_vector",
"data": [1, 2, ...],
},
"search_by": {
"ﬁeld": "book_cover_vector",
"data": [2, 3, ...],
},
},
"rerank": {
"strategy": "rrf",
},
"limit": 10,
}
Retrieve Params
Re-rank Params

Multimodal Retrieval Augmented Generation (RAG) with Milvus

More Related Content

Similar to Multimodal Retrieval Augmented Generation (RAG) with Milvus

Similar to Multimodal Retrieval Augmented Generation (RAG) with Milvus (20)

More from Zilliz

More from Zilliz (20)

Recently uploaded

Recently uploaded (20)

Multimodal Retrieval Augmented Generation (RAG) with Milvus