Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Not able to reduce segment size in V2.4.5 #34550

Open
1 task done
sumanthhuddar opened this issue Jul 9, 2024 · 8 comments
Open
1 task done

[Bug]: Not able to reduce segment size in V2.4.5 #34550

sumanthhuddar opened this issue Jul 9, 2024 · 8 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@sumanthhuddar
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.4.5
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    kafka
- SDK version(e.g. pymilvus v2.0.0rc2):java
- OS(Ubuntu or CentOS): ubuntu
- CPU/Memory: 32 CPU/120GB 
- GPU: 
- Others:

Current Behavior

I loaded collection with maxsegment size set to 1024 MB , after loading ~4 M , i had ~26 segments. I Changed maxsegment to 4GB , and restarted datacoord node, and nunber of segments reduced from 26 to 8 . Now when i try to increase number of segments by setting maxsegmentsize to 2GB , it is not effective. tried restarting data coord and data nodes , load\unload collection , compact , also tuned with sealProportion with values 0.12,.0.24,.75 but there is no change in segmets\segment size.

Expected Behavior

Decreasing maxsegment should increase number of segments.

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

@sumanthhuddar sumanthhuddar added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 9, 2024
@yanliang567
Copy link
Contributor

increasing the value of maxsegmentsize is a one-way street for the existing segments. Milvus only has compaction policy to compact small segments to larger ones, but has no policy to split the large segments to smaller ones. So for now this is expected. @sumanthhuddar

/assign @sumanthhuddar
/unassign

@yanliang567 yanliang567 added help wanted Extra attention is needed and removed kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jul 10, 2024
@xiaofan-luan
Copy link
Contributor

I thought this is actaully doable.
With the new clustering compaction, segment can be split into the size limtied.

@wayblink could you please take a look at it

@wayblink
Copy link
Collaborator

@sumanthhuddar As @yanliang567 mentioned, it is currently expected. We will think about adding policy to split the large segments to smaller ones. Clustering compaction can compact segments to smaller size. However, it needs alter the collection, set clustering key field first, it is not supported now.

I wonder what's your purpose for trying different maxSegmentSize? We can find out best solution for you.

@sumanthhuddar
Copy link
Author

@wayblink I am doing some perf test , and started with maxSegmentSize = 1024 MB , and after loading data , we had 26 sealed segments , with this we didn't saw good performance , so we reduced number of segments to 8 and got good latency . for testing purpose we further reduced segment count to 2. I thought we can increase\decrease maxSegmentSize to right size the segments. It would be nice if we have this option to compact bigger segments to smaller segments in future release.

@xiaofan-luan
Copy link
Contributor

again we don't recommend to do so, PLEASE keep segment size to be less than 8G, unless you never delete, update and insert

@xiaofan-luan
Copy link
Contributor

Build a huge index is not really useful in production environment. You have many other way to optimize performance, and scale or use zilliz cloud is usaully the easiest way.

We have so much pain to use larger index so my suggesion is simply "Don't do that!"

@sumanthhuddar
Copy link
Author

@xiaofan-luan thanks for the suggestions.

@xiaofan-luan
Copy link
Contributor

Segment size of 2GB or 4GG is reasonable.
We recomend user with 16GB memory or more to use 2GB segment size. 32GB or more to use 4GB segment szie.
The maximum segment size we recommend is 8GB.

The reason is build index a 8GB or more segment is very expensieve (Takes hours to do so with 8core32G index, might even OOM). also frequent update, insert will bring trouble as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
4 participants