Some questions about hybrid search, WeightedRanker and COSINE score #34415
-
The website docs https://milvus.io/docs/reranking.md#Weighted-Scoring-WeightedRanker Question 1According the website docs, Milvus use milvus/internal/proxy/reScorer.go Line 90 in 648d566 Question 2I searched for 4 columns using hybrid search and WeightedRanker.But the final score does not match my expectations. Case: # index params
index.add_index(field_name="vec1", index_type="FLAT", metric_type="COSINE")
index.add_index(field_name="vec2", index_type="FLAT", metric_type="COSINE")
index.add_index(field_name="vec3", index_type="FLAT", metric_type="COSINE")
index.add_index(field_name="vec4", index_type="FLAT", metric_type="COSINE")
# weight
rerank = WeightedRanker(0.3, 0.15, 0.25, 0.3)
# search param
for field in ['vec1','vec2','vec3','vec4']:
search_param = {
"data": [query_vec],
"anns_field": field ,
"param": {
"metric_type": "COSINE",
"radius": 0.99,
"range_filter": 1.0
},
"limit": 50
} ## Score calculated by myself
The cosine sim between query_vec and vec1 is 0.8377
The cosine sim between query_vec and vec2 is 0.8451
The cosine sim between query_vec and vec3 is 0.9690
The cosine sim between query_vec and vec4 is 0.9304
My expected scores is 0.3 * (0.8377+1) / 2 + 0.15 * (0.8451+1) / 2 + 0.25 * (0.9690+1) / 2 + 0.3 * (0.9304+1) / 2 = 0.9497
## Score from Milvus
0.2895 Question 3In the case from Question2, I set radius and range_filter to filter. But why are data with scores below 0.99 not filtered? Looking for your reply |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 6 replies
-
Q1
|
Beta Was this translation helpful? Give feedback.
-
Q2
Run this script by several times, the "Sum of weight*active_func(distance)" is always equal to the result of hybrid search. |
Beta Was this translation helpful? Give feedback.
-
Q3
|
Beta Was this translation helpful? Give feedback.
Q1
The doc mentioned "For instance, the distance for IP ranges from [-∞,+∞], while the distance for L2 ranges from [0,+∞]. Milvus employs the arctan function, transforming values to the [0,1] range to provide a standardized basis for different metric types.".
Arctan is for IP and L2 whose range is unlimited(∞).
COSINE range is [-1, 1], is limited, the (1 + distance) * 0.5 makes sense.