Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel composite upload support #1067

Merged
merged 5 commits into from
Oct 27, 2023
Merged

parallel composite upload support #1067

merged 5 commits into from
Oct 27, 2023

Conversation

singhravidutt
Copy link
Contributor

@singhravidutt singhravidutt commented Oct 26, 2023

This PR adds the support for integrating with new write APIs introduced in java-storage to leverage ParallelCompositeUpload (PCU) feature.

@singhravidutt singhravidutt marked this pull request as ready for review October 26, 2023 05:12
@singhravidutt
Copy link
Contributor Author

/gcbrun

@codecov
Copy link

codecov bot commented Oct 26, 2023

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (48891bf) 81.71% compared to head (b21262b) 81.68%.

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #1067      +/-   ##
============================================
- Coverage     81.71%   81.68%   -0.03%     
- Complexity     1669     1675       +6     
============================================
  Files            96       96              
  Lines          7275     7318      +43     
  Branches        890      890              
============================================
+ Hits           5945     5978      +33     
- Misses          969      975       +6     
- Partials        361      365       +4     
Flag Coverage Δ
integrationtest 67.50% <82.35%> (+0.08%) ⬆️
unittest 73.23% <49.01%> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...op/fs/gcs/GoogleHadoopFileSystemConfiguration.java 99.09% <100.00%> (+0.05%) ⬆️
...le/cloud/hadoop/util/AsyncWriteChannelOptions.java 86.36% <100.00%> (+3.50%) ⬆️
...oud/hadoop/gcsio/GoogleCloudStorageClientImpl.java 82.20% <65.38%> (-6.34%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

* alignment with configuration of java-storage client
* https://cloud.google.com/java/docs/reference/google-cloud-storage/latest/com.google.cloud.storage.ParallelCompositeUploadBlobWriteSessionConfig.BufferAllocationStrategy#com_google_cloud_storage_ParallelCompositeUploadBlobWriteSessionConfig_BufferAllocationStrategy_fixedPool_int_int_
*/
public static final HadoopConfigurationProperty<Integer> GCS_PCU_BUFFER_COUNT =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update Configuration.md as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have benchmarking numbers out then will be adding it to configurations file.

@@ -588,6 +632,10 @@ private static AsyncWriteChannelOptions getWriteChannelOptions(Configuration con
.setUploadType(GCS_CLIENT_UPLOAD_TYPE.get(config, config::getEnum))
.setTemporaryPaths(
ImmutableSet.copyOf(GCS_WRITE_TEMPORARY_FILES_PATH.getStringCollection(config)))
.setPCUBufferCapacity(GCS_PCU_BUFFER_COUNT.get(config, config::getInt))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add UTs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

.setWriteChannelOptions(
pcuDefaultOptions.toBuilder()
.setPartFilePrefix(partFilePrefix)
.setPartFileCleanupType(PartFileCleanupType.NEVER)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ON_SUCCESS is not tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add test for it.

writeChannel.write(ByteBuffer.wrap(bytesToWrite));
// part files are getting uploaded in async thread
// wait for it to complete before listing files
Thread.sleep(5000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@arunkumarchacko
Copy link
Contributor

/gcbrun

return PartCleanupStrategy.never();
case ON_SUCCESS:
return PartCleanupStrategy.onlyOnSuccess();
default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://google.github.io/styleguide/javaguide.html#s4.8.4-switch

It is recommended to explicitly list the values, rather than having default. This is more future proof.

@@ -251,12 +257,43 @@ private static BlobWriteSessionConfig getSessionConfig(AsyncWriteChannelOptions
writeOptions.getTemporaryPaths().stream()
.map(x -> Paths.get(x))
.collect(ImmutableSet.toImmutableSet()));
case PARALLEL_COMPOSITE_UPLOAD:
ExecutorService pcuThreadPool =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit weird. If that is the case, why can't it create it?

@@ -251,12 +257,43 @@ private static BlobWriteSessionConfig getSessionConfig(AsyncWriteChannelOptions
writeOptions.getTemporaryPaths().stream()
.map(x -> Paths.get(x))
.collect(ImmutableSet.toImmutableSet()));
case PARALLEL_COMPOSITE_UPLOAD:
ExecutorService pcuThreadPool =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either case, add a comment. It is not obvious.

BufferAllocationStrategy.fixedPool(
writeOptions.getPCUBufferCount(), writeOptions.getPCUBufferCapacity()))
.withPartCleanupStrategy(getPartCleanupStrategy(writeOptions.getPartFileCleanupType()))
.withExecutorSupplier(ExecutorSupplier.useExecutor(pcuThreadPool))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not go with the default option?

}

private static final int PARALLEL_COMPOSITE_UPLOAD_BUFFER_COUNT = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment with TODO.

.build())
.build();

gcs = getGCSImpl(storageOptions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old test does not close does not mean that we keep on repeating the mistake, I guess.

private static GoogleCloudStorage helperGcs;

private GoogleCloudStorage gcs;

private int partFileCount = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

private static GoogleCloudStorage helperGcs;

private GoogleCloudStorage gcs;

private int partFileCount = 2;
private int bufferCapacity = partFileCount * ONE_MiB;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated.

@singhravidutt singhravidutt merged commit e081924 into master Oct 27, 2023
3 of 4 checks passed
singhravidutt added a commit that referenced this pull request Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants