Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhance: Avoid assign too much segment/channels to new querynode (#34096) #34461

Merged
merged 1 commit into from
Jul 10, 2024

Conversation

weiliu1031
Copy link
Contributor

issue: #34095
pr: #34096

When a new query node comes online, the segment_checker, channel_checker, and balance_checker simultaneously attempt to allocate segments to it. If this occurs during the execution of a load task and the distribution of the new query node hasn't been updated, the query coordinator may mistakenly view the new query node as empty. As a result, it assigns segments or channels to it, potentially overloading the new query node with more segments or channels than expected.

This PR measures the workload of the executing tasks on the target query node to prevent assigning an excessive number of segments to it.


@sre-ci-robot sre-ci-robot added the size/L Denotes a PR that changes 100-499 lines. label Jul 5, 2024
@mergify mergify bot added dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement labels Jul 5, 2024
Copy link
Contributor

mergify bot commented Jul 5, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Copy link
Contributor

mergify bot commented Jul 6, 2024

@weiliu1031 E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

…vus-io#34096)

issue: milvus-io#34095

When a new query node comes online, the segment_checker,
channel_checker, and balance_checker simultaneously attempt to allocate
segments to it. If this occurs during the execution of a load task and
the distribution of the new query node hasn't been updated, the query
coordinator may mistakenly view the new query node as empty. As a
result, it assigns segments or channels to it, potentially overloading
the new query node with more segments or channels than expected.

This PR measures the workload of the executing tasks on the target query
node to prevent assigning an excessive number of segments to it.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Copy link
Contributor

mergify bot commented Jul 8, 2024

@weiliu1031 ut workflow job failed, comment rerun ut can trigger the job again.

@weiliu1031
Copy link
Contributor Author

rerun ut

@mergify mergify bot added the ci-passed label Jul 8, 2024
Copy link

codecov bot commented Jul 8, 2024

Codecov Report

Attention: Patch coverage is 93.10345% with 4 lines in your changes missing coverage. Please review.

Project coverage is 83.10%. Comparing base (9de5b15) to head (fec9c45).
Report is 6 commits behind head on 2.3.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##              2.3   #34461      +/-   ##
==========================================
+ Coverage   83.07%   83.10%   +0.02%     
==========================================
  Files         851      851              
  Lines      107090   107142      +52     
==========================================
+ Hits        88969    89036      +67     
+ Misses      14807    14791      -16     
- Partials     3314     3315       +1     
Files Coverage Δ
internal/querycoordv2/balance/balance.go 92.30% <100.00%> (ø)
...al/querycoordv2/balance/rowcount_based_balancer.go 99.09% <100.00%> (+0.01%) ⬆️
...ernal/querycoordv2/balance/score_based_balancer.go 97.12% <100.00%> (+0.03%) ⬆️
internal/querycoordv2/task/scheduler.go 84.46% <91.30%> (+4.79%) ⬆️

... and 37 files with indirect coverage changes

@yanliang567 yanliang567 added this to the 2.3.19 milestone Jul 9, 2024
Copy link
Contributor

@XuanYang-cn XuanYang-cn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@congqixia
Copy link
Contributor

/approve

@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: congqixia, weiliu1031, XuanYang-cn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sre-ci-robot sre-ci-robot merged commit d3d1920 into milvus-io:2.3 Jul 10, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved ci-passed dco-passed DCO check passed. kind/enhancement Issues or changes related to enhancement lgtm size/L Denotes a PR that changes 100-499 lines.
5 participants