Support creating client specific configs for different buckets #25

matthewmturner · 2022-01-27T21:18:53Z

...it's very common to setup different access control for different buckets, so we will need to support creating different clients with specific configs for different buckets in the future. For example, in our production environment, we have spark jobs that access different buckets hosted in different AWS accounts.

Originally posted by @houqp in #20 (comment)

With context provided by @houqp:

IAM policy attached to IAM users (via access/secret key) is easier to get started with. For more secure and production ready setup, you would want to use IAM role instead of IAM users so there is no long lived secrets. The place where things get complicated is cross account S3 write access. In order to do this, you need to assume an IAM role in the S3 bucket owner account to perform the write, otherwise the bucket owner account won't be able to truly own the newly written objects. The result of that is the bucket owner won't be able to further share the objects with other accounts. In short, in some cases, the object store need to assume and switch to different IAM roles depending on which bucket it is writing to. For cross account S3 read, we don't have this problem, so you can usually get by with a single IAM role.

And potential designs also provided by @houqp:

Maintain a set of protocol specific clients internally within the S3 object store implementation for each bucket
Extend ObjectStore abstraction in datafusion to support a hierarchy based object store lookup. i.e. first lookup a object store specific uri key generator by scheme, then calculate a unique object store key for given uri for the actual object store lookup.

I am leaning towards option 1 because it doesn't force this complexity into all object stores. For example, local file object store will never need to dispatch to different clients based on file path. @yjshen curious what's your thought on this.

matthewmturner · 2022-01-27T21:30:19Z

@seddonm1 FYI created this to continue conversations on the topic.

Do you think that this should be a requirement before publishing the crate?

seddonm1 · 2022-01-27T21:31:28Z

@matthewmturner I think this is an edge-case but up to @houqp to answer.

Most users will never use this functionality so I think we can easily publish a 0.1 pending the DataFusion release then this can be added after.

matthewmturner · 2022-01-27T22:25:24Z

@seddonm1 i saw you raised awslabs/aws-sdk-rust#425.
Would you like something like what was proposed to be added as a type of credentials provider?

houqp · 2022-01-28T01:59:20Z

@seddonm1 definitely not a blocker for crates.io release :) Just a feature we can work on later.

seddonm1 · 2022-01-28T19:52:02Z

@matthewmturner that request was around being able to access public buckets which is independent to this request

matthewmturner · 2022-01-28T20:11:18Z

@seddonm1 yes understood that its separate from this - just wasnt sure if you wanted to add a new issue for that functionality.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support creating client specific configs for different buckets #25

Support creating client specific configs for different buckets #25

matthewmturner commented Jan 27, 2022 •

edited

Loading

matthewmturner commented Jan 27, 2022

seddonm1 commented Jan 27, 2022

matthewmturner commented Jan 27, 2022

houqp commented Jan 28, 2022

seddonm1 commented Jan 28, 2022

matthewmturner commented Jan 28, 2022

Support creating client specific configs for different buckets #25

Support creating client specific configs for different buckets #25

Comments

matthewmturner commented Jan 27, 2022 • edited Loading

matthewmturner commented Jan 27, 2022

seddonm1 commented Jan 27, 2022

matthewmturner commented Jan 27, 2022

houqp commented Jan 28, 2022

seddonm1 commented Jan 28, 2022

matthewmturner commented Jan 28, 2022

matthewmturner commented Jan 27, 2022 •

edited

Loading