Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Scalar indexes cannot search out data #34548

Open
1 task done
syang1997 opened this issue Jul 9, 2024 · 26 comments
Open
1 task done

[Bug]: Scalar indexes cannot search out data #34548

syang1997 opened this issue Jul 9, 2024 · 26 comments
Assignees
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@syang1997
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.3.15
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):   kafka 
- SDK version(e.g. pymilvus v2.0.0rc2): java sdk 2.3.4  , attu
- OS(Ubuntu or CentOS): CentOS
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Hybrid search cannot find out data, but a separate query can find out data
img_3
img_4
This scalar query has data
img_5

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

milvus-log (1).tar.gz

Anything else?

No response

@syang1997 syang1997 added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 9, 2024
@syang1997
Copy link
Author

Just now I backed up this vector to another cluster, and the exception can still be reproduced. After deleting the collection from the backup cluster, I backed up and restored it again, and it cannot be reproduced.

@yanliang567
Copy link
Contributor

/assign @zhagnlu
please help to take a look
/unassign

@sre-ci-robot sre-ci-robot assigned zhagnlu and unassigned yanliang567 Jul 10, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 10, 2024
@yanliang567 yanliang567 added this to the 2.4.6 milestone Jul 10, 2024
@syang1997
Copy link
Author

This is not because scalar filtering causes hnsw to be unable to perform layer traversal, because there is another set of data that can also have data with scalars after multiple queries. And no data operation is performed during this period

@syang1997
Copy link
Author

企业微信截图_0d95f1b9-6c93-432b-a642-65ab25f5e055
企业微信截图_62c52532-acb0-4f40-8fca-acf446495788

@syang1997
Copy link
Author

This collection configures replicas. Is it caused by index differences between replicas?

@zhagnlu
Copy link
Collaborator

zhagnlu commented Jul 10, 2024

企业微信截图_0d95f1b9-6c93-432b-a642-65ab25f5e055 企业微信截图_62c52532-acb0-4f40-8fca-acf446495788

what difference between upper and bottom search ?

@syang1997
Copy link
Author

syang1997 commented Jul 10, 2024

what difference between upper and bottom search ?

@zhagnlu There is no difference, multiple requests return completely different results

@zhagnlu
Copy link
Collaborator

zhagnlu commented Jul 10, 2024

what difference between upper and bottom search ?

@zhagnlu There is no difference, multiple requests return completely different results

if not hybrid search, just using query, will multiple requests return completely different results ?

@syang1997
Copy link
Author

syang1997 commented Jul 10, 2024

what difference between upper and bottom search ?

@zhagnlu There is no difference, multiple requests return completely different results

if not hybrid search, just using query, will multiple requests return completely different results ?

Vector search and query returns normally

@syang1997
Copy link
Author

@zhagnlu Another phenomenon is that some search conditions cannot be returned at all if they have scalar filtering, but vector searches have returns. But this scalar filtering has data. hybrid search returns blank

@yanliang567
Copy link
Contributor

okay, so the issue here is that if using query with expr filter on scalar fields, the results are not correct or consistent(not expected). But if query without fitlering, the results are always consistent(expected). Am I right?
/assign @cydrain @liliu-z
could you please also help to take a look

@syang1997
Copy link
Author

syang1997 commented Jul 11, 2024

@yanliang567 Yes, sometimes the returned results are inconsistent, and sometimes the returned results are incorrect.Appears only on the hnsw index plus scalar filtering

@syang1997
Copy link
Author

Regenerated debug log
milvus-log (2).tar.gz

@syang1997
Copy link
Author

Regenerated debug log milvus-log (2).tar.gz

@yanliang567 @cydrain @liliu-z Can you help us check together?

@alwayslove2013
Copy link
Contributor

@syang Could you please tell us the filter_rate and index building parameters?
The current open source Milvus may have “less than top-k search results” problems with high filter_rates (70-90%) and low M.

@syang1997
Copy link
Author

syang1997 commented Jul 15, 2024

@syang Could you please tell us the filter_rate and index building parameters? The current open source Milvus may have “less than top-k search results” problems with high filter_rates (70-90%) and low M.

image Most searches are normal, and now the M value is not small
@syang1997
Copy link
Author

@alwayslove2013 This collection has less than 20,000 data, but the M value and efConstruction are large enough (I think).I know about the data island problem that scalar filtering and hnsw work together, and I have previously investigated and adjusted the index construction parameters

@cydrain
Copy link
Contributor

cydrain commented Jul 16, 2024

Hi @syang1997 ,

Can you share your script to reproduce this issue ?

@syang1997
Copy link
Author

Hi @syang1997 ,

Can you share your script to reproduce this issue ?

I'm coding a demo to replicate this issue

@cydrain
Copy link
Contributor

cydrain commented Jul 16, 2024

Hi @syang1997 ,

One more question, I see you're using Milvus v2.3.15, have you tried Milvus v2.4.x ?

@xiaofan-luan
Copy link
Contributor

@syang1997
I think we need data to reproduce this issue.
@cydrain please setup a meeting with syang see if we can get some data to reproduce

@syang1997
Copy link
Author

@syang1997 I think we need data to reproduce this issue. @cydrain please setup a meeting with syang see if we can get some data to reproduce

We have already communicated with the community once, and the preliminary reason is still the previous data island problem

@syang1997
Copy link
Author

@syang1997 I think we need data to reproduce this issue. @cydrain please setup a meeting with syang see if we can get some data to reproduce

@xiaofan-luan The phenomenon is that there is no return instead of returning insufficient topK, so it is suspected that the first layer node of HNSW is filtered by all

@xiaofan-luan
Copy link
Contributor

after discussion, it seems the reason might be hnsw filtered 70-80% data, cause graph connectivity brokes

@liliu-z
Copy link
Member

liliu-z commented Jul 25, 2024

This fix will be released with 2.4.7

@liliu-z
Copy link
Member

liliu-z commented Jul 25, 2024

/assign @yanliang567

@liliu-z liliu-z removed their assignment Jul 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
7 participants