Tom Baeyens’ Post

View profile for Tom Baeyens, graphic

Co-founder and CTO at Soda

A data contract is a promise, not an agreement. When reading about data contracts, regularly I come across the definition that a data contract is an agreement between producer and consumer. While that is technically correct, it also implies that there is some sort of negotiation and that every consumer can negotiate their own contract. The latter does not fit in my notion of a contract. For me the analogy with software interfaces like REST APIs or GraphQL is much more appropriate. The producer builds and operates a data pipeline that produces a dataset. In order for consumers to be able to use it, the producer provides a contract to document the dataset, provide stability guarantees, implement monitoring for data quality and so on. In a contract the producer can publish their promises so that consumers can start using the data as much as possible in a self serve style. The goal is that consumers should not have to negotiate with the producer team to start using the data. Only in case there are data requirements that are not yet met. In that case of course a requirements conversation can take place. Ok, let me know if I'm nitpicking here, but I do think it helps to see a contract as the producers responsibility to publish their up-to-date promise. Building a bespoke contract every time a consumer uses a dataset is imo not scalable.

Jelle De Vleminck

Lead Platform Engineer | Lecturer at Erasmushogeschool Brussel

2mo

I've always found the term "data contract" confusing when people talk about a comparison with real life contracts or an agreement between producer and consumer. People who do this usually refer to the process of how a data contract is concluded and that the consumer, the one who knows the use case and the potential business value with the data, has a say in how the data should be shared by the producer. In my opinion, there is still a level missing and you have to make a distinction between a data interface contract, which is exactly what you describe as a data contract, and the agreement itself, where a consumer indicates for what purpose they will use data and (temporary) data access is given to that data interface that is well described. Being able to do the latter in a user-friendly, self-service automated way, where the business can give the approvals itself, is for me the key to allowing business to take ownership of the data. The registration of data interfaces / data products should be done by engineers, but business can be put in the loop as approvers afterwards.

David Moreau

Senior Software Engineer Technical Lead at Netapp

2mo

Considering how often consumers are pulling data from products in active development by going behind APIs that enforce business logic, there definitely should be a negotiation before the owner of that data agrees to any restrictions. As important as collecting that data may be, it often is less important than the feature that data was intended to support—features that often provide direct value to customers. Otherwise, you will get producers never promising anything to keep things flexible. Premptively slapping a data contract on all data can slow development to a crawl. Also, how will the producer even know who is consuming their data self-serve? How can they inform the consumers about necessary schema changes or down time if there is no initial engagement where the producer agrees to keep them in the loop? I am fine saying it isn’t technically a contract because the consumer doesn’t provide anything. it is a very one-way relationship.

Like
Reply
Stijn (Stan) Christiaens

Co-founder & Chief Data Citizen at Collibra

2mo

A contract is a promise, not an agreement - brb gonna try this with my bank

Jochen Christ

Data Contract Management

2mo

Strongly agree. You might need to track data usage agreements or access requests as additional resources, next to your data contract.

Ugo Ciracì

UAO! Co-Founder & CTPO | Agiler | Data Architect | Data Mesh Practitioner | Data Strategist | Business Unit Manager of Utility and Telco at Agile Lab

2mo

A data contract is (or should be) subjected to the "market". A data contract that doesn't satisfy any consumer is basically useless. A data contract satisfying all users is pretty challenging. Anyhow, a data contract should aim for standardization rather than personalization. So, generally speaking I agree.

Keith Dewar

DataOps | Networks | Transformation | Strategy | Product

2mo

Agree, it's a producer publishing their promise on quality/performance and not product by product customised negotiations. But it's also producers making their achievement of or deviation from their promise visible and evidenced. Otherwise the contract is just an aspirational statement. And we know what the road to hell is paved with!

Sireesha Pulipati

Data Engineering & Analytics | Cloud Architecture | Ex-Google | Mentor | Author of Data Storytelling with Google Looker Studio | Stanford GSB

2mo

It sounds to me like "data products". Products deliver promises and guarantee or strive to guarantee what's been implicitly or explicitly "agreed upon"

Like
Reply
Jean-Georges Perrin

Technologist & Innovator | CIO AbeaData | Author | Co-founder AIDA User Group | Lifetime IBM Champion | Chair Bitol | LinkedIn Top Voice

2mo

Let’s keep in mind ODCS and the Bitol project :)

Like
Reply
Diogenes Braz

Data Engineer Specialist | Azure, AWS | Spark | Proficient in Python, Scala, SQL | Committed to Scalable Data Solutions & Analytics

2mo
See more comments

To view or add a comment, sign in

Explore topics