Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blob sync takes 18 minutes, replicate from guangzhou to shanghai #20704

Open
z7658329 opened this issue Jul 5, 2024 · 7 comments
Open

blob sync takes 18 minutes, replicate from guangzhou to shanghai #20704

z7658329 opened this issue Jul 5, 2024 · 7 comments

Comments

@z7658329
Copy link

z7658329 commented Jul 5, 2024

i have deploy two harbor(version 2.7.3), one in guangzhou another one deploy in shanghai, use repicate feature to sync images;

now i have see the sync task log(replicate from guangzhou to shanghai), an bold about 3g size, It takes 18 minutes to replicate from guangzhou to shanghai;
and occurs three times connection reset error, is there any way to avoid connection reset error?

image
@z7658329 z7658329 changed the title It takes 18 minutes to replicate from guangzhou to shanghai Jul 5, 2024
@chlins
Copy link
Member

chlins commented Jul 5, 2024

If you encounter the problem of frequent replication failures due to the large size of the blob, you can consider using replication by chunk, which will break the blob into smaller chunks for transmission to improve the success rate. FYI: https://goharbor.io/docs/2.7.0/administration/configuring-replication/create-replication-rules/

@z7658329
Copy link
Author

z7658329 commented Jul 9, 2024

If you encounter the problem of frequent replication failures due to the large size of the blob, you can consider using replication by chunk, which will break the blob into smaller chunks for transmission to improve the success rate. FYI: https://goharbor.io/docs/2.7.0/administration/configuring-replication/create-replication-rules/

it seems more slowly, when change to use replication by chunk;
my config chunkSize: 500M (defalut value 10M)

@chlins so what's the best practices chunkSize value? 100M?

@z7658329
Copy link
Author

z7658329 commented Jul 9, 2024

i have read the source code, seems each chunk of a blob is pulled, processed, and pushed in sequence, one after another? why not use goroutine ? is it based on any consideration? thanks @chlins

@chlins
Copy link
Member

chlins commented Jul 9, 2024

If you encounter the problem of frequent replication failures due to the large size of the blob, you can consider using replication by chunk, which will break the blob into smaller chunks for transmission to improve the success rate. FYI: https://goharbor.io/docs/2.7.0/administration/configuring-replication/create-replication-rules/

it seems more slowly, when change to use replication by chunk; my config chunkSize: 500M (defalut value 10M)

@chlins so what's the best practices chunkSize value? 100M?

There is no uniform best practice for this value because it depends entirely on the environment and needs to be adjusted to the appropriate value according to the network environment.

@chlins
Copy link
Member

chlins commented Jul 9, 2024

i have read the source code, seems each chunk of a blob is pulled, processed, and pushed in sequence, one after another? why not use goroutine ? is it based on any consideration? thanks @chlins

It defined in the distribution spec(https://github.com/opencontainers/distribution-spec/blob/main/spec.md), all the chunked must be uploaded in order, because the next chunk location relies on the previous chunk response header, so cannot use goroutines to push chunks parallelly.

@z7658329
Copy link
Author

z7658329 commented Jul 9, 2024

i have read the source code, seems each chunk of a blob is pulled, processed, and pushed in sequence, one after another? why not use goroutine ? is it based on any consideration? thanks @chlins

It defined in the distribution spec(https://github.com/opencontainers/distribution-spec/blob/main/spec.md), all the chunked must be uploaded in order, because the next chunk location relies on the previous chunk response header, so cannot use goroutines to push chunks parallelly.

In my scenario, most of blobs are under 1g, now i have added a new feature in harbor: support replication by chunk when blob size is too largre(defalut value is 2g), and can override by setting env REPLICATION_CHUNK_BLOB_SIZE in the jobservice , and works well in prod env. thanks very much @chlins

@z7658329
Copy link
Author

z7658329 commented Jul 10, 2024

image

in the same node, i have tested use “docker push” command to push a large image, it push success(A single POST request), and dose not occurs error; so what's the difference between “docker push” and "harbor replication implementation" , both follow the registry specification

@chlins

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants