Eric Olmsted, Ph.D.’s Post

Value Based Care Analytics

1mo

I mentioned in a previous post that successful VBC organizations must strengthen their data pipelines in order to take advantage of analytics that can drive clinical results. To that end below are my 4 pillars of healthcare data pipeline design. Symptoms of a data pipeline with issues include delays in reporting, inconsistent values across your organizational domains, and a lack of trust from clinical users. 1) Raw Record Primacy - Healthcare data is continuously managed, massaged, and warehoused. When incorporating any healthcare data into your analytic structure favor data that is as close to the original source as possible. Trust fields from the billing data over fields from the warehouse (e.g. UB Type of Bill is of more value than an Inpatient Flag or ED Visit ID; MRN will track patient data through an EHR better than a payer member ID). 2) Transparency - Good analysts must understand the data that is being passed to them at the end of the data pipeline. To enable trust and understanding it is critical to provide accurate transparency as to how the data was processed at each step of the journey. Two techniques I have had success with include 'direct documentation' and data lineage. 'Direct documentation' is my term for using the same tables for both data processing and data documentation. There is no need to maintain separate documentation from your code stack as everything can be converted to a table (or file) driven structure. This prevents the inevitable disconnect between your documentation of the code and the actual operation of the code. This further allows for a data lineage engine that can walk specific fields from raw through analytic datamart to accurately explain how the analytic data was created. 3) Conceptual Design - Many healthcare datamarts suffer from concept agglomeration whereby mutiple fields accumulate over time that represent the same underlying concept. This frequently happens during data ingestion as engineers may be unaware of a field that already exists and mistakenly create another to serve the same need. This can happen at any point during data processing. Be ruthless in your conceptual design and create ontologies and hierarchies that organize the data into higher level concepts such that humans can drill down to the specific field they need when doing data mapping. A place for everything and everything in its place will prevent significant downstream confusion. 4) Fail as Fast as Possible - The key to this is to understand what your data processing algorithm 'knows' about the data at each step in the data pipeline. It is impossible to check PMPMs on raw data so don't design your QC process to only fail at the end. Check control totals and field gaps at the start. Once your data mapping is complete you can add field-specific validation. Your data processing algorithm should be learning about the data at each step of the process and the QC should be designed to fail at each step when possible.

6 Comments

John Lee

1mo

Eric Olmsted, Ph.D. this is so spot on. If you pull the threads, much of what you describe is dependent on good #clinicalinformatics work, especially the themes you have of getting close to the data source as possible. Good #clinicalinformatics also facilitates far more rapid iteration as you fail fast. In addition to curating the data as close to the data source as possible, they can then take the insights from the output of your data science tools and directly alter the transactional workflows...which then creates more data. If you do it right, you create a virtuous cycle where the data feeds the workflows and the workflows feed the data: a true learning healthcare system.

1 Reaction

Matthew Schroder

Software Engineer at Optum

1mo

Ah, this puts a lot of what we’ve done into perspective for me. Thank you 💯

1 Reaction

Alexandra Schweitzer

Executive Fellow at the Harvard Business School Social Enterprise Initiative

1mo

Gaby Alcala-Levy True for RA data too?

1 Reaction

Yubin Park, PhD

Chief Builder at mimilabs | LinkedIn Top Voice | Advisor to Astrana Health | Ph.D., Machine Learning and Health Data

1mo

Eric Olmsted, Ph.D. I really love this! Thanks for sharing your valuable insights!

1 Reaction

See more comments

To view or add a comment, sign in

More Relevant Posts

Gregg Malkary
1mo
Report this post
Strong data pipelines are crucial for successful value-based care. Prioritize raw records close to the source for reliability and maintain clear documentation directly in your data tables. This direct approach helps prevent discrepancies and ensures that data processing is transparent. Organizing your data effectively and setting up early fail-safes in the process can greatly enhance trust and efficiency in healthcare analytics. These practices are not just about managing data—they're about building a foundation for better patient outcomes and streamlined healthcare operations. #HealthcareData #Analytics #ValueBasedCare

Eric Olmsted, Ph.D.

Value Based Care Analytics
1mo

I mentioned in a previous post that successful VBC organizations must strengthen their data pipelines in order to take advantage of analytics that can drive clinical results. To that end below are my 4 pillars of healthcare data pipeline design. Symptoms of a data pipeline with issues include delays in reporting, inconsistent values across your organizational domains, and a lack of trust from clinical users. 1) Raw Record Primacy - Healthcare data is continuously managed, massaged, and warehoused. When incorporating any healthcare data into your analytic structure favor data that is as close to the original source as possible. Trust fields from the billing data over fields from the warehouse (e.g. UB Type of Bill is of more value than an Inpatient Flag or ED Visit ID; MRN will track patient data through an EHR better than a payer member ID). 2) Transparency - Good analysts must understand the data that is being passed to them at the end of the data pipeline. To enable trust and understanding it is critical to provide accurate transparency as to how the data was processed at each step of the journey. Two techniques I have had success with include 'direct documentation' and data lineage. 'Direct documentation' is my term for using the same tables for both data processing and data documentation. There is no need to maintain separate documentation from your code stack as everything can be converted to a table (or file) driven structure. This prevents the inevitable disconnect between your documentation of the code and the actual operation of the code. This further allows for a data lineage engine that can walk specific fields from raw through analytic datamart to accurately explain how the analytic data was created. 3) Conceptual Design - Many healthcare datamarts suffer from concept agglomeration whereby mutiple fields accumulate over time that represent the same underlying concept. This frequently happens during data ingestion as engineers may be unaware of a field that already exists and mistakenly create another to serve the same need. This can happen at any point during data processing. Be ruthless in your conceptual design and create ontologies and hierarchies that organize the data into higher level concepts such that humans can drill down to the specific field they need when doing data mapping. A place for everything and everything in its place will prevent significant downstream confusion. 4) Fail as Fast as Possible - The key to this is to understand what your data processing algorithm 'knows' about the data at each step in the data pipeline. It is impossible to check PMPMs on raw data so don't design your QC process to only fail at the end. Check control totals and field gaps at the start. Once your data mapping is complete you can add field-specific validation. Your data processing algorithm should be learning about the data at each step of the process and the QC should be designed to fail at each step when possible.
Like Comment
To view or add a comment, sign in
Ravindra Nagpurkar

CTO | Digital Transformations | Problem Solver | Automation First Approach | Data Science | AI/ML | Outcome over Optics | Voted Tech Visionary
12mo
Report this post
📊 𝗗𝗮𝘁𝗮 𝗣𝗶𝗽𝗲𝗹𝗶𝗻𝗲 𝗮𝗻𝗱 𝘄𝗵𝘆 𝘆𝗼𝘂𝗿 𝗼𝗿𝗴 𝗻𝗲𝗲𝗱𝘀 𝗼𝗻𝗲 ❓ The reliance on making use of data in decision making is increasing at an exponential pace. Business leaders have multiple use-cases, but the unfortunate reality is the software systems in most cases were never built keeping data in mind. As a result a lot of shoddy patch work is done to apply a quick fix and somehow live for another day. It's time to look at data in a systematic and scientific way to either refactor existing systems or to build a layer of data systems to serve the business needs. 🥁 𝔼𝕟𝕥𝕖𝕣 𝕕𝕒𝕥𝕒-𝕡𝕚𝕡𝕖𝕝𝕚𝕟𝕖𝕤.🥁 📌 A data pipeline is a software architecture that facilitates the capture, organization, transformation and movement of data so that is ready for consumption to gain insights via multiple analytics clients and technologies such as AL/ML algorithms, Data Models and Live Analytics Dashboards. 📌 In addition, the pipeline also organizes/catalogues the data as per business needs there-by making reporting and business analytics of the data more relevant and streamlined. 📌 A customized pipeline is a mix of the right technology (Kafka for example) and protocols. 𝗪𝗵𝘆 𝘀𝗵𝗼𝘂𝗹𝗱 𝘆𝗼𝘂𝗿 𝗼𝗿𝗴𝗮𝗻𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗯𝘂𝗶𝗹𝗱 𝗮 𝗱𝗮𝘁𝗮-𝗽𝗶𝗽𝗲𝗹𝗶𝗻𝗲 ❓ The motivation to build a data pipeline may be split into two areas. One is business needs and second engineering and technology benefits. 🔷 𝔹𝕦𝕤𝕚𝕟𝕖𝕤𝕤 𝔾𝕠𝕒𝕝𝕤 📌 𝘚𝘪𝘯𝘨𝘭𝘦 𝘚𝘰𝘶𝘳𝘤𝘦 𝘰𝘧 𝘛𝘳𝘶𝘵𝘩 : Not only does a data pipeline provide an undisputed single source of truth, such pipelines with use of a data fabric can support cross-functional collaboration across various disciplines within an organization 📌 𝘛𝘳𝘢𝘯𝘴𝘱𝘢𝘳𝘦𝘯𝘤𝘺 𝘰𝘧 𝘋𝘢𝘵𝘢 : With a single source of truth a data pipeline imparts transparency to the data there-by building credibility. 📌 𝘈𝘤𝘩𝘪𝘦𝘷𝘦 𝘉𝘶𝘴𝘪𝘯𝘦𝘴𝘴 𝘖𝘣𝘫𝘦𝘤𝘵𝘪𝘷𝘦𝘴 : This empowers users across say marketing, business analytics, business intelligence, data science and technology teams to use the same data via a standardized system. 🔷𝔼𝕟𝕘𝕚𝕟𝕖𝕖𝕣𝕚𝕟𝕘 𝕒𝕟𝕕 𝕋𝕖𝕔𝕙𝕟𝕠𝕝𝕠𝕘𝕪 𝔾𝕒𝕚𝕟𝕤 📌 𝘚𝘪𝘮𝘱𝘭𝘪𝘧𝘪𝘦𝘥 𝘈𝘳𝘤𝘩𝘪𝘵𝘦𝘤𝘵𝘶𝘳𝘦 : By delegating movement, transformation and management of the data, various modules of a system can be built to just focus on their independent task without having the burden of data management 📌 𝘚𝘵𝘢𝘯𝘥𝘢𝘳𝘥𝘪𝘻𝘢𝘵𝘪𝘰𝘯 𝘢𝘯𝘥 𝘈𝘣𝘴𝘵𝘳𝘢𝘤𝘵𝘪𝘰𝘯 : By standardization of the data protocol and abstracting those from the applications and modules, it becomes easier to build and scale data pipeline independently. 📌 𝘓𝘰𝘸 𝘓𝘢𝘵𝘦𝘯𝘤𝘺 : Overall systems and cloud infrastructure’s architecture is simplified and hence improving system response times. 💡 𝗜𝗻 𝗖𝗹𝗼𝘀𝗶𝗻𝗴: To derive a solid return on tech investment, it is prudent to build data pipelines to fulfil business needs. #data #tech #ai #business #datapipelines #kafka #gcpcloud #aws
1 Comment
Like Comment
To view or add a comment, sign in
Peter Wilkerson

Sr Manager, Data Architect at Forvis Mazars US, LLP
1mo Edited
Report this post
#DataDomain #DataMesh #KnowledgeGraph #DataCatalog These are some of the topics I have been following lately when it comes to data. At times I feel like I am awash in the different discussions. (Anybody else feel that way?) Nevertheless I feel these concepts can bring very real benefits to business when realized. I am a pragmatist at heart. I want to understand what it would mean to bring these together in a way that would bring value to the business. Below is my take (currently) at a high level. I've tried to take a pragmatic approach in describing how they might work together. Can you help develop my thinking further? What do you like? dislike? What would you add? Data Domain - Different areas of a business have different vocabularies. By vocabularies I mean both the metadata and the metadata values. There may be limited overlap between them, except for the company's core business dimensions/attributes used to manage the services offered at an enterprise level. (I've seen a game called "Data Domain. The Game" to help business understand this better. What a great idea!) Data Mesh - Data Mesh, as I understand it, is about how business vocabularies of the enterprise and business domains are managed. Dimensions held in common at the enterprise level can be managed centrally. Vocabularies of domains will be managed formally or informally at the domain level, in spite of any efforts to control centrally. If vocabulary management is done informally, then coordination between domains will be informal. If managed formally (at the domain level), then coordination and collaboration is more likely. I often describe a collaborative approach this way: "Seek compatibility, not conformity." Knowledge Graph - It is not enough to know the language of business (at whatever level). It is important to understand and document the relationships between concepts. Knowing relationships establishes context. Information placed in context is a powerful tool. To know there are relationships between two or more objects is one thing. The ability to explicitly label relationships has the potential of delivering business value at a whole other level Data Catalog - Much can be done to bring value to the business if the above are established. So much more value can be delivered if there is a place to store and access the information in a business-user friendly way. Providing meaningful access without obfuscating the richness of the above is quite a challenge. I have been following posts and comments of the following people here on LinkedIn. I recommend that you check out their work: Ole Olesen-Bagneux, Sagar Lad, Mike Dillinger, PhD, Malcolm Hawker

13 Comments
Like Comment
To view or add a comment, sign in
Hernan Revale

Senior Business Intelligence Consultant | MSc Business Analytics, Imperial College Business School
1mo
Report this post
Granularity, referred to as the level of detail of our data, plays a key role in shaping the structure and functionality of our data models. Any data analyst or data engineer can relate to working with granularity shifts, that is, needing to accommodate the level of detail of the data from source to destination by, for instance, aggregating it. A higher granularity means more detail; whereas a lower granularity means less detail. Data Vault 2.0 won’t be the exception, as granularity takes on a unique significance, influencing how we capture, store, and deliver insights from our data. In traditional Kimball’s dimensional modeling, granularity is defined by the level of detail within fact tables. Here, we encounter three distinct granularities, from the highest level of granularity to the lowest they can achieve: transaction fact tables, period snapshot fact tables, and accumulating snapshot fact tables. Transaction fact tables encapsulate singular measurement events, typically having the highest level of granularity as they capture individual events or transactions in detail. Period snapshot fact tables summarize data over standard periods, such as days or weeks, offering a broader perspective suited for trend analysis. Lastly, accumulating snapshot fact tables track only deltas after an initial snapshot, ideal for monitoring changes over time. Now, how does Data Vault 2.0 approach granularity? In Data Vault, the Raw Data Vault serves as the custodian of raw data, preserving the original granularity from source systems. Unlike traditional models where aggregation might occur during loading, the Raw Data Vault maintains the finest possible grain, ensuring no loss of detail. While the Raw Vault efficiently organizes and integrates our data into hubs, links, and satellites, it maintains the granularity intact, serving as a bedrock for downstream transformations. However, in the journey from raw data to actionable insights, the Information Delivery layer often demands a different granularity tailored to the needs of the business, that is, our target granularity. Thus, arises the need to derive or transform the target granularity from the existing Raw Data Vault model, which will happen in the Business Vault and Information Mart layers. The transformation process will involve understanding business requirements, extracting meaningful insights, and aligning them with the structural integrity of the Data Vault. The goal is to bridge the gap between raw data and information delivery, aimed to provide tangible business value. In essence, DV2.0 maintains the finest possible grain within its Raw Data Vault, preserving the original level of detail from the source systems. This is done to keep the untouched history of the data, and hence being able to track anything back to its source (auditability). Then, the target granularity will be achieved in the downstream layers, such as the Business Vault and Information Mart.
Like Comment
To view or add a comment, sign in
Arezou Solouki

Experienced Product Leader | B2B SaaS | Driving Product Innovation and Growth
1mo Edited
Report this post
The concept of "data product" has evolved, along with the principles of "data as a product" introduced by Zhamak Dehghani in the context of data mesh. This principle is sometimes shortened to "data products," which can lead to confusion. Xavier Gumara Rigol in his article Data as a product vs data products. What are the differences? clarifies the differences between Data as a product and data products . A must read one! To understand the definition of "data as a product" in the data mesh world, this quote from Zhamak Dehghani’s original article is key: “Domain data teams must apply product thinking […] to the datasets that they provide; considering their data assets as their products and the rest of the organization’s data scientists, ML and data engineers as their customers.” In summary, "data as a product" is the outcome of applying product thinking to datasets, ensuring they possess capabilities such as discoverability, security, explorability, understandability, and trustworthiness. Data product” is a generic concept and “data as a product” is a subset of all possible data products. A Data product is a product whose primary objective is to use data to facilitate an end goal.” Data products, whether they are entire customer-facing products or partial back-end products, possess different characteristics than other technology products. Simon O'Regan devides data products into the following categories *Raw data: Data collected and made available as it is, with minimal processing. * Derived data: Data that has been processed to some extent, with additional attributes or transformations applied. * Algorithms : Services where data is input into an algorithm, which then processes the data and returns insights or information. * Decision support: Information and insights are provided to assist users in making decisions, with most processing done on the provider’s side. * Automated decision-making: Systems where all of the intelligence and decision-making within a given domain are automated. The algorithm performs the work and presents the user with the final output. If we use Simon’s categories, “data as a product” belongs to the raw or derived data type of “data product”. Must read articles Designing Data Products https://lnkd.in/d_xawwB8 How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh https://lnkd.in/dWAYrhqB Data as a product vs data products. What are the differences? https://lnkd.in/dDjdJmES #dataproduct #dataasaproduct #productmanagement

4 Comments
Like Comment
To view or add a comment, sign in
Martin Abelson Sahlen

CEO & Co-Founder at Alvin
1w
Report this post
Why is it worth investing in metadata quality and structure? Don't we already have problems delivering ROI as is? For anyone working with data, this seems like a reasonable objection. However, metadata really is the key to overcoming many of the headwinds data teams are facing. How? 👉 Adoption and Use of Data Products By viewing tables and reports as data products (slightly simplified), a data team can use aggregated and derived metadata to understand who is using what, and how it is being used. This can then be used to prioritize tasks (e.g., a report used by many takes a long time to load and is expensive. Based on the data sources, one can set up an aggregate to avoid joining many tables "on the fly" - which will reduce runtime and cost, as well as improve the end-user experience, a win-win). 👉 Proactive Approach to Changes/Migrations By understanding data flow and data usage, one can predict how changes will affect existing processes (e.g., dbt models that depend on another model that will be changed), and downstream systems such as PowerBI. Another important point is that one can also understand which users will be affected, making it easier for the data team to communicate proactively about "downtime" or changes that will require adaptation on the user side. Sometimes changes are necessary, but it is much less stressful to own the communication versus controlling a situation where something has gone wrong. 👉 Find Tables That Are Never Used By finding tables that are never used, one can delete them, saving storage costs and compliance/GDPR issues regarding the rightful use of data. 👉 Find Jobs That Create Tables That Are Never Used As a rule, unused tables that are not a product of ad-hoc analysis (this should be done in a schema that automatically deletes data after i.e. 7 days) will be produced with some frequency, and if the tables are not used, it follows logically that a job that produces the table is also useless and can therefore be stopped. 👉 Eliminate Unnecessary Processing There is a lot of talk about real-time, but in most cases it is sufficient to update data at a frequency so that yesterday's data is available. It is obvious that running a frequency of once an hour versus once a day (every 24 hours), i.e. 24 times as many jobs, will make a huge difference in cost. If you are not entirely sure how this looks in the company, you can, for example, look at the access pattern of a table to understand what time frame it is used and adjust the update rate. Typical easy optimizations here will also be to skip runs on weekends and holidays, when people are typically not at work anyway. By going from 5 to 7 days a week where jobs are run, you have a saving of about 28%. If a high update rate (1 time per hour or more) is actually necessary, then you can limit the update to within normal working hours. Let's say between 8:00-16:00, i.e. a saving of 16 hours per day, or 66%. Stay tuned for additional tips and tricks!
Like Comment
To view or add a comment, sign in
Wannes Rosiers

Data mesh learning MVP - Data platform evangelist - Data strategist
2w
Report this post
The missing piece to data democratization is more actionable than a catalog. Don't get me wrong: catalogs are very much needed to make data discoverable, which is a first step to making it usable. But catalogs are already around for a few years and I don't believe that we really have arrived at a mature state where data is available for everyone. So what are we missing? The full article can be found on my Medium: https://hubs.li/Q02FlMTh0 🎁 The rise of BI and self-serve BI has been focussing on maximizing the number of data consumers. Over time the dependency on data teams has shifted: first consumers were depending on them for insights, now for data. Yet those data teams do not always understand all business processes, and goals, and at larger organizations, they can not be expected to do so. That's why data products and a federated data workforce are emerging: to maximize the number of data producers, who have expert business knowledge on their domain. 🌐 Product thinking and federation are happening right now. Sometimes a bit pragmatic, for example first focussing on source-oriented data products, or less strict definitions of data products. Note that there is not yet a single definition probably. Yet what is clear already, is that data products are a concept that fit not directly in current catalogs. 🏛️💼📜 Some governance is missing and it is not limited to having an overview of what exist. Data sits across multiple systems, metadata is available in catalogs, compute still lives somewhere else. Navigating these tools and governing cross-tool processes, that's what missing and that's where the Data Product Portal is aiming to become the bridge. I'm keen to getting feedback! #dataproduct #datacatalog #datamesh

The missing piece to data democratization is more actionable than a catalog

medium.com

3 Comments
Like Comment
To view or add a comment, sign in
Morgan Templar Morgan Templar is an Influencer

Visionary CEO | LinkedIn Top Voice | Board Member | Speaker | Data | Strategy | Governance | Author of "Get Governed" and "A Culture of Governance"
4mo
Report this post
The concepts of Data Products, Data as a Product, and Data Assets are foundational in data management and analytics. Each structure highlights a different perspective on how data can be utilized and valued within an organization or the marketplace. A lot of confusion exists about the definitions of each structure. I have my own perspective on what qualifies the use or collection of data as a Data Product, Data as a Product, or a Data Asset. The long version is available on Substack. Here is a quick preview: Data Product: A collection of data curated and managed with a product lifecycle. It leverages predefined data and data patterns to solve a problem or fulfill a specific need for the consumer. Data as a Product: Refers to the ecosystem of data as a holistic product that can be accessed with a variety of methods. It is the buffet of data. DaaP is differentiated by the meticulous application of mature data management practices and well mapped and understood data connections, definitions, and sources of truth. It shifts the focus from the mere collection of data to curating and presenting it in a manner that adds direct value. Data Assets: The concept of data assets emphasizes the strategic value of data, treating it as a key resource that can provide a competitive advantage when managed and utilized effectively. Data assets are considered an integral part of an organization's intellectual property and are protected and managed with care to ensure they maintain their value. Check out the full article on Substack. Read, Like, Follow

The Islands of Productizing Data

morgantemplar.substack.com

4 Comments
Like Comment
To view or add a comment, sign in
Ankit Mehta

Business Evangelist | Data & AI Enthusiast | MD & VP
7mo
Report this post
Last week, I’d an interesting meet with the #dataexecutives of a leading #automotive company. Our conversation shed light on some critical challenges within the data landscape and thought of sharing these insights. 𝘋𝘢𝘵𝘢 𝘊𝘢𝘵𝘢𝘭𝘰𝘨𝘴: 𝘈 𝘋𝘶𝘢𝘭-𝘗𝘶𝘳𝘱𝘰𝘴𝘦 𝘚𝘰𝘭𝘶𝘵𝘪𝘰𝘯 #chiefdataofficer emphasised the pivotal role #datacatalogs play in their #datagovernance strategy, suggesting catalogs serves a dual purpose: 1️⃣ Automating Metadata Gathering: they streamline the process of gathering metadata, ensuring data is well-documented. 2️⃣ Providing User-Friendly Interfaces: they empower business users by offering user-friendly interfaces for adding contextual information to the data. 𝘛𝘩𝘦 𝘊𝘩𝘢𝘭𝘭𝘦𝘯𝘨𝘦: 𝘉𝘳𝘪𝘥𝘨𝘪𝘯𝘨 𝘵𝘩𝘦 𝘎𝘢𝘱 𝘣𝘦𝘵𝘸𝘦𝘦𝘯 𝘉𝘶𝘴𝘪𝘯𝘦𝘴𝘴 𝘢𝘯𝘥 𝘛𝘦𝘤𝘩𝘯𝘪𝘤𝘢𝘭 𝘊𝘰𝘯𝘵𝘦𝘹𝘵 However, the real challenge as per #cdo lies in bridging the gap between business and technical context. Despite the robust functionalities offered by most data catalog tools: structuring, cleaning, and annotating metadata with business context often still requires tedious and repetitive manual intervention. 𝘓𝘦𝘴𝘴𝘰𝘯𝘴 𝘭𝘦𝘢𝘳𝘯𝘵 𝘧𝘳𝘰𝘮 𝘐𝘮𝘱𝘭𝘦𝘮𝘦𝘯𝘵𝘪𝘯𝘨 𝘋𝘢𝘵𝘢 𝘊𝘢𝘵𝘢𝘭𝘰𝘨𝘴 What struck me most was their experience with implementing data catalogs not just once, but twice: 1️⃣st Implementation: the initial data catalog automated some aspects. Although it had limitations, as it struggled to effectively integrate business and technical context. 2️⃣nd Implementation: 'next-gen data catalog' they implemented the second time around, promised improvements but fell short. It couldn't keep up with the ever-increasing volumes of data and the pace of change within the company. 𝘛𝘩𝘦 𝘘𝘶𝘦𝘴𝘵 𝘧𝘰𝘳 𝘢 𝘏𝘰𝘭𝘪𝘴𝘵𝘪𝘤 𝘚𝘰𝘭𝘶𝘵𝘪𝘰𝘯 Their experience underscored the need for a solution that transcends the current capabilities of data catalogs. They’re now seeking a solution that seamlessly connects technical and business data & metadata, providing real-time insights into all data contexts – be it business, data, or IT. Our discussion left me pondering about the evolving landscape of data management, and the reason why we’ve built 𝐃𝐚𝐭𝐚 𝐏𝐢𝐥𝐨𝐭𝐢𝐧𝐠: a solution that can truly bridges the gap between #business and #technology, offering a unified real-time view of dynamic enterprise #data + #metadata. Vokse is at the intersection of data governance, observability, data catalogs offering a holistic approach to #datamanagement. #happymonday #ml #ai #data #datareliability #datapiloting #manufacturing #automotive #automotiveindustry Vokse
Like Comment
To view or add a comment, sign in
Jan Meskens

Data Strategy Consultant | Speaking, sketching and writing about the data world | "I believe that usable data will always lead to valuable data."
5mo
Report this post
A strategy centered on data products paves the way for any forthcoming innovations in data. 𝐁𝐮𝐭 𝐡𝐨𝐰 𝐝𝐨 𝐲𝐨𝐮 𝐢𝐧𝐢𝐭𝐢𝐚𝐭𝐞 𝐬𝐮𝐜𝐡 𝐚 𝐬𝐭𝐫𝐚𝐭𝐞𝐠𝐲? I addressed this during JUVO's lunch webinar, outlining a five-step method that is iterative, practical, and simple to initiate: 📦 𝐈𝐧𝐢𝐭𝐢𝐚𝐥 𝐃𝐚𝐭𝐚 𝐏𝐫𝐨𝐝𝐮𝐜𝐭 𝐁𝐚𝐜𝐤𝐥𝐨𝐠 - Collaborate with your business stakeholders to brainstorm potential data products that could support them. Consider a wide range of solutions such as dashboards, AI algorithms, or hybrid methods that could solve real business problems. 🛢 𝐖𝐡𝐚𝐭: 𝐈𝐝𝐞𝐧𝐭𝐢𝐟𝐲𝐢𝐧𝐠 𝐑𝐞𝐪𝐮𝐢𝐫𝐞𝐦𝐞𝐧𝐭𝐬 𝐟𝐨𝐫 𝐘𝐨𝐮𝐫 𝐏𝐫𝐨𝐝𝐮𝐜𝐭𝐬 - For each product, determine the necessary inputs (data) and outputs (the tools where the product will be utilized). At this stage, examine the attributes of both inputs and outputs. Questions to consider include whether the data should be real-time, the expected volume of data, whether the data is structured or unstructured, and the known quality of the required data, among others. 💎 𝐖𝐡𝐲: 𝐏𝐮𝐫𝐩𝐨𝐬𝐞 𝐁𝐞𝐡𝐢𝐧𝐝 𝐭𝐡𝐞 𝐏𝐫𝐨𝐝𝐮𝐜𝐭𝐬 - Discover the core value drivers of the data products and, where possible, align these drivers with your business objectives. Common examples include data products designed to enhance productivity or reduce waste. 🤼 𝐖𝐡𝐨: 𝐓𝐚𝐫𝐠𝐞𝐭 𝐔𝐬𝐞𝐫𝐬 𝐟𝐨𝐫 𝐓𝐡𝐞𝐬𝐞 𝐃𝐚𝐭𝐚 𝐏𝐫𝐨𝐝𝐮𝐜𝐭𝐬 - Understanding the profile of the potential users of these data products enables you to customize the products to meet their needs more effectively. To stimulate the adoption of a data driven way of working, it is crucial to find a fit between the delivered data product and it's future users. 🗺 𝐇𝐨𝐰: 𝐒𝐭𝐫𝐚𝐭𝐞𝐠𝐢𝐳𝐢𝐧𝐠 𝐏𝐫𝐨𝐝𝐮𝐜𝐭 𝐒𝐮𝐩𝐩𝐨𝐫𝐭 - With insights from the previous steps, you can begin to outline a framework for your future data platform and the governance processes around it. These elements are crucial for transforming data product requirements (𝘵𝘩𝘦 𝘞𝘩𝘢𝘵, 𝘞𝘩𝘺, 𝘢𝘯𝘥 𝘞𝘩𝘰) into a model and data-driven organization that consistently delivers value. In this stage, data expert roles (architects, governance specialists, strategists, etc) can enter the workshop to bring in their experience to solve the outlined problems. Through various workshops employing this approach, I've found it to be an effective way to quickly generate meaningful discussions around data products and set a direction for future developments. Should you be curious about how this methodology could be tailored to your specific context, I'm always happy to engage in a conversation about it! #datastrategy #dataproducts #artificialintelligence
2 Comments
Like Comment
To view or add a comment, sign in

1,902 followers

View Profile Follow

Eric Olmsted, Ph.D.’s Post

More from this author

Choosing an EOM Risk Arrangement

Oncology VBC Lessons Learned

Social Risk Adjustment for APM

Explore topics