Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support creating client specific configs for different buckets #25

Open
matthewmturner opened this issue Jan 27, 2022 · 6 comments
Open

Comments

@matthewmturner
Copy link
Collaborator

matthewmturner commented Jan 27, 2022

...it's very common to setup different access control for different buckets, so we will need to support creating different clients with specific configs for different buckets in the future. For example, in our production environment, we have spark jobs that access different buckets hosted in different AWS accounts.

Originally posted by @houqp in #20 (comment)

With context provided by @houqp:

IAM policy attached to IAM users (via access/secret key) is easier to get started with. For more secure and production ready setup, you would want to use IAM role instead of IAM users so there is no long lived secrets. The place where things get complicated is cross account S3 write access. In order to do this, you need to assume an IAM role in the S3 bucket owner account to perform the write, otherwise the bucket owner account won't be able to truly own the newly written objects. The result of that is the bucket owner won't be able to further share the objects with other accounts. In short, in some cases, the object store need to assume and switch to different IAM roles depending on which bucket it is writing to. For cross account S3 read, we don't have this problem, so you can usually get by with a single IAM role.

And potential designs also provided by @houqp:

  1. Maintain a set of protocol specific clients internally within the S3 object store implementation for each bucket

  2. Extend ObjectStore abstraction in datafusion to support a hierarchy based object store lookup. i.e. first lookup a object store specific uri key generator by scheme, then calculate a unique object store key for given uri for the actual object store lookup.

I am leaning towards option 1 because it doesn't force this complexity into all object stores. For example, local file object store will never need to dispatch to different clients based on file path. @yjshen curious what's your thought on this.

@matthewmturner
Copy link
Collaborator Author

@seddonm1 FYI created this to continue conversations on the topic.

Do you think that this should be a requirement before publishing the crate?

@seddonm1
Copy link
Collaborator

@matthewmturner I think this is an edge-case but up to @houqp to answer.

Most users will never use this functionality so I think we can easily publish a 0.1 pending the DataFusion release then this can be added after.

@matthewmturner
Copy link
Collaborator Author

@seddonm1 i saw you raised awslabs/aws-sdk-rust#425.
Would you like something like what was proposed to be added as a type of credentials provider?

@houqp
Copy link
Member

houqp commented Jan 28, 2022

@seddonm1 definitely not a blocker for crates.io release :) Just a feature we can work on later.

@seddonm1
Copy link
Collaborator

@matthewmturner that request was around being able to access public buckets which is independent to this request

@matthewmturner
Copy link
Collaborator Author

@seddonm1 yes understood that its separate from this - just wasnt sure if you wanted to add a new issue for that functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants