blob sync takes 18 minutes, replicate from guangzhou to shanghai #20704

z7658329 · 2024-07-05T02:26:34Z

i have deploy two harbor(version 2.7.3), one in guangzhou another one deploy in shanghai, use repicate feature to sync images;

now i have see the sync task log（replicate from guangzhou to shanghai）, an bold about 3g size, It takes 18 minutes to replicate from guangzhou to shanghai；
and occurs three times connection reset error, is there any way to avoid connection reset error?

chlins · 2024-07-05T04:09:21Z

If you encounter the problem of frequent replication failures due to the large size of the blob, you can consider using replication by chunk, which will break the blob into smaller chunks for transmission to improve the success rate. FYI: https://goharbor.io/docs/2.7.0/administration/configuring-replication/create-replication-rules/

z7658329 · 2024-07-09T03:32:00Z

If you encounter the problem of frequent replication failures due to the large size of the blob, you can consider using replication by chunk, which will break the blob into smaller chunks for transmission to improve the success rate. FYI: https://goharbor.io/docs/2.7.0/administration/configuring-replication/create-replication-rules/

it seems more slowly, when change to use replication by chunk;
my config chunkSize: 500M (defalut value 10M)

@chlins so what's the best practices chunkSize value? 100M?

z7658329 · 2024-07-09T04:40:54Z

i have read the source code, seems each chunk of a blob is pulled, processed, and pushed in sequence, one after another? why not use goroutine ? is it based on any consideration? thanks @chlins

chlins · 2024-07-09T07:07:45Z

If you encounter the problem of frequent replication failures due to the large size of the blob, you can consider using replication by chunk, which will break the blob into smaller chunks for transmission to improve the success rate. FYI: https://goharbor.io/docs/2.7.0/administration/configuring-replication/create-replication-rules/

it seems more slowly, when change to use replication by chunk; my config chunkSize: 500M (defalut value 10M)

@chlins so what's the best practices chunkSize value? 100M?

There is no uniform best practice for this value because it depends entirely on the environment and needs to be adjusted to the appropriate value according to the network environment.

chlins · 2024-07-09T07:11:23Z

i have read the source code, seems each chunk of a blob is pulled, processed, and pushed in sequence, one after another? why not use goroutine ? is it based on any consideration? thanks @chlins

It defined in the distribution spec(https://github.com/opencontainers/distribution-spec/blob/main/spec.md), all the chunked must be uploaded in order, because the next chunk location relies on the previous chunk response header, so cannot use goroutines to push chunks parallelly.

z7658329 · 2024-07-09T07:44:25Z

i have read the source code, seems each chunk of a blob is pulled, processed, and pushed in sequence, one after another? why not use goroutine ? is it based on any consideration? thanks @chlins

It defined in the distribution spec(https://github.com/opencontainers/distribution-spec/blob/main/spec.md), all the chunked must be uploaded in order, because the next chunk location relies on the previous chunk response header, so cannot use goroutines to push chunks parallelly.

In my scenario, most of blobs are under 1g, now i have added a new feature in harbor: support replication by chunk when blob size is too largre(defalut value is 2g), and can override by setting env REPLICATION_CHUNK_BLOB_SIZE in the jobservice , and works well in prod env. thanks very much @chlins

z7658329 · 2024-07-10T05:13:02Z

in the same node, i have tested use “docker push” command to push a large image， it push success（A single POST request）, and dose not occurs error; so what's the difference between “docker push” and "harbor replication implementation" , both follow the registry specification

@chlins

z7658329 changed the title ~~It takes 18 minutes to replicate from guangzhou to shanghai~~ Jul 5, 2024

chlins added the area/replication label Jul 5, 2024

z7658329 closed this as completed Jul 5, 2024

z7658329 reopened this Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blob sync takes 18 minutes, replicate from guangzhou to shanghai #20704

blob sync takes 18 minutes, replicate from guangzhou to shanghai #20704

z7658329 commented Jul 5, 2024

chlins commented Jul 5, 2024

z7658329 commented Jul 9, 2024

z7658329 commented Jul 9, 2024

chlins commented Jul 9, 2024

chlins commented Jul 9, 2024

z7658329 commented Jul 9, 2024

z7658329 commented Jul 10, 2024 •

edited

Loading

blob sync takes 18 minutes, replicate from guangzhou to shanghai #20704

blob sync takes 18 minutes, replicate from guangzhou to shanghai #20704

Comments

z7658329 commented Jul 5, 2024

chlins commented Jul 5, 2024

z7658329 commented Jul 9, 2024

z7658329 commented Jul 9, 2024

chlins commented Jul 9, 2024

chlins commented Jul 9, 2024

z7658329 commented Jul 9, 2024

z7658329 commented Jul 10, 2024 • edited Loading

z7658329 commented Jul 10, 2024 •

edited

Loading