SlideShare a Scribd company logo
STRUCTURED DATA
STREAMING
SHIVJI KUMAR JHA
@ShivjiJha
TRACK: STREAMING
in/shivjijha
About Me
• Senior MTS at Nutanix
• Platform Engineer
– DBs, SOA, Infra, Streams
• Love
– Distributed data systems
– Open-source software (OSS)
• OSS Contributions
– Apache Pulsar
– MySQL
Contents
Why
Schema?
How
Popular
formats
What
Examples,
Learnings
Why: Abstractions
(historical perspective)
4
A Brief History…
Of Databases
• 1960: Flat Files
• 1960s: Hierarchical Databases
– Need for structure
• 1980: SQL / Relational Databases
– High-level language
– Some more structure!
• 2004: NoSQL
– Scale & Availability above all
– No relational model, less structure
• 2010s: Distributed SQL
– Well, no, we need structure*
Image source: https://commons.wikimedia.org/wiki/File:Human_evolution.svg
A Brief History…
Of Data Streams
• Apache Kafka:
– Built inside LinkedIn
– 2011: Kafka becomes open source
– 2012: Graduated from Apache incubator
• Apache Pulsar
– Built at Yahoo
– 2016: Contributed to Open source
– 2018: Top-level Apache project
Image source: https://commons.wikimedia.org/wiki/File:Human_evolution.svg
History tell us…
Evolution
• SQL -> NoSQL -> Distributed SQL
– Relation database has strict
schema
• Streaming bytes -> Schema
Registry
– Both Kafka and Pulsar support
schema registry now!
– Its not ideal to stream bytes
• Use schema wherever possible
– Err on the side of having schema
OBJECTS
PRIMITIVE TYPES
BYTES
BITS
What:
Schema 101
8
Hello PubSub!
Produce
Data
Consume
Data
Hello PubSub!
Produce
Data
Consume
Data
Computers only know bits…
Encoding data Example: Write an employee record
• Sending data to a computer
– Local computer
– Over network
• Can't send as is.
• Encode to bits
– Also, serialization
• Send
https://www.raywenderlich.com/books/swift-apprentice/v6.0/chapters/22-encoding-decoding-types
Computers only know bits…
Decoding data
Example: Read an employee
record
• Read data from a computer
– Local computer
– Over network
• Turn bytes to employee rocrd
• Decode to bits
– Also, de-serialization
• Use in program.
https://www.raywenderlich.com/books/swift-apprentice/v6.0/chapters/22-encoding-decoding-types
Encoder / decoder placements
Produce
Data
Consume
Data
• Choice 1 : App
Encoder / decoder placements
Produce
Data
Consume
Data
• Choice 2 : Producer/Consumer
Encoder / decoder placements
Produce
Data
Consume
Data
• Choice 3 : Stream Platform
Abstraction vs Flexibility
Abstraction
• Abstract out encode / decode to
the stream platform
• Lighter Individual apps
– Single Responsibility Principle
• Easy Evolution of schema
– Versioning
• Less bugs!
Flexibility
• Keep encode / decode in (each)
app
• Flexibility of choice
– Schema formats
• Schema evolution is hard
– Versioning
– Upgrade Path?
• More bugs?
Flexibility? : Choose Wisely…
Flexibility? : Choose Wisely…
• Flexibility is a good choice when:
– Non-uniform data
• Custom Fields
• Non-uniform Types
– Frequent schema migrations
How:
Schema Representations
19
Schema : Choice 1
Use Native serialization of programming language
• Examples:
– Java serialization
– Python’s pickle
– Ruby’s Marshall
• Good
– Easy implementation
• Bad
– Locked with same programming language for
producer and consumer
– Difficult to Evolve schema versions
• Upgrade Path?
Schema : Choice 2
Use same format as web APIs (REST?)
• Examples:
– JSON
– XML
• Good
– Familiar implementation, share code!
– Text, readable, easy to debug
• Bad
– Key Name in every message, too much data
– Auto detected type, may go wrong…
– New types? Nested types? Ship POJO
library?
• Document? Synchronize? Ignore new data?
Schema : Choice 3
Struct Schema : Avro, Thrift, Protocol Buffers
• Good
– Binary formats, less space
– Matured over the years
– Well documented
– Libraries in multiple languages
– Good support in stream
ecosystem
– Evolution with versioning
• Bad
– Extra learning curve
Case Study:
Apache Pulsar Schema
Apache Pulsar 101
PRODUCER CONSUMER
• Cloud-native,
• Distributed messaging and
• Distributed streaming platform
Apache Pulsar
• Modular Design
• Horizontally scalable
• Low latency & high throughput
• Multi-tenancy
• Geo Replication
Highlights
Pulsar schema : Byte Schema
Domain Object Pulsar Producer : Sample Code
Pulsar schema : String Schema
Producer
Consumer
Pulsar schema : Avro Schema
Pulsar schema : Schema Registry
• Topic to schemas mapping.
• Stores accepted schemas for a topic.
• Manages evolution with versioning.
• Producer adds schema, if compatible.
• Consumer fetches schema, given topic & message.
• Schema => [name, payload, type, properties]
Schema Evolution
• Manual
– Check every schema before upgrade
• Auto – updates
– If new schema passes compatibility tests, producer uploads
new version of schema.
Schema : Compatibility modes
Schema : Compatibility modes
Schema Registry: Producer
Schema Registry: Consumer
Pulsar schema : Schema Registry
AUTO_PRODUCE
• Validates whether the bytes sent is
compatible
• If not, rejects.
Produce<byte[]> pulsarProducer =
client.newProducer
(Schema.AUTO_PRODUCE())
…
.create();
AUTO_CONSUME
• Validate whether bytes sent from topic is
compatible with schemas on topic.
• If not, rejects.
Consumer<GenericRecord>
pulsarConsumer =
client.newConsumer(Schema.AUTO_CONSUME
())
…
.subscribe();
Topic Schema Mapping
• Topic : schema = 1:1 ?
• What about relative ordering?
• Opinion:
– Model domain to topic
– Domain may have multiple schema.
• Example : User, accounts, subscription
– Prefer relative ordering
– Work with parallel evolution
• User v1 -> User V2 -> User V3
• Account v1 -> Account V2 -> Account V3
• Subscription v1 -> Subscription V2 -> Subscription V3
Schema across pipeline
• Pulsar IO
– Source (Examples : Flink, Spark, Elasticsearch)
– Pulsar
– Sink (Examples : Flink, Spark, Elasticsearch)
• Same schema across pipeline
– Unless you decorate..
– Unless different format for optimization
• Type of query
Learnings
Learnings over the years
• Struct schemas model domain objects well.
• Binary representation is space efficient.
• Use schemas management on apps, only if you need that
extra flexibility.
• Use schema registry by default.
• Recommend Avro
– Json schema – a bit too verbose
– Proto awesome, not adopted well among sources / sinks.
– Avro is adopted really well.
• Decide and set compatibility / evolution rules. Worth it!
References
1. Pulsar docs: https://pulsar.apache.org/docs/en/schema-get-
started/
2. Schema auto update strategy:
https://pulsar.apache.org/docs/en/pulsar-admin/#set-schema-
autoupdate-strategy
3. Schema Evolution in Avro, Thrift, Protobuff:
https://martin.kleppmann.com/2012/12/05/schema-
evolution-in-avro-protocol-buffers-thrift.html
4. Topic design per domain: https://www.confluent.io/blog/put-
several-event-types-kafka-topic/
5. Schema Compatibility Design:
https://docs.confluent.io/platform/current/schema-
registry/avro.html#compatibility-types
THANK YOU
QUESTIONS?
@ShivjiJha
shiv4289
in/shivjijha/
ShivjiKumarJha

More Related Content

What's hot

Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
Knoldus Inc.
 
The Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scaleThe Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scale
Neha Narkhede
 
Making Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development TeamsMaking Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development Teams
Lightbend
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
 
Introducing Kubernetes
Introducing Kubernetes Introducing Kubernetes
Introducing Kubernetes
VikRam S
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
confluent
 
Confluent building a real-time streaming platform using kafka streams and k...
Confluent   building a real-time streaming platform using kafka streams and k...Confluent   building a real-time streaming platform using kafka streams and k...
Confluent building a real-time streaming platform using kafka streams and k...
Thomas Alex
 
Event sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreEvent sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event Store
Shivji Kumar Jha
 
Change Data Capture using Kafka
Change Data Capture using KafkaChange Data Capture using Kafka
Change Data Capture using Kafka
Akash Vacher
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
Joe Stein
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
confluent
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Diego Pacheco
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
David Groozman
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Ramakrishna kapa
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Long Nguyen
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Zeeshan Khan
 
Scaling spark on kubernetes at Lyft
Scaling spark on kubernetes at LyftScaling spark on kubernetes at Lyft
Scaling spark on kubernetes at Lyft
Li Gao
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Clement Demonchy
 

What's hot (20)

Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
 
The Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scaleThe Many Faces of Apache Kafka: Leveraging real-time data at scale
The Many Faces of Apache Kafka: Leveraging real-time data at scale
 
Making Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development TeamsMaking Scala Faster: 3 Expert Tips For Busy Development Teams
Making Scala Faster: 3 Expert Tips For Busy Development Teams
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Introducing Kubernetes
Introducing Kubernetes Introducing Kubernetes
Introducing Kubernetes
 
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the FieldKafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
Kafka Summit SF 2017 - Kafka Connect Best Practices – Advice from the Field
 
Confluent building a real-time streaming platform using kafka streams and k...
Confluent   building a real-time streaming platform using kafka streams and k...Confluent   building a real-time streaming platform using kafka streams and k...
Confluent building a real-time streaming platform using kafka streams and k...
 
Event sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event StoreEvent sourcing Live 2021: Streaming App Changes to Event Store
Event sourcing Live 2021: Streaming App Changes to Event Store
 
Change Data Capture using Kafka
Change Data Capture using KafkaChange Data Capture using Kafka
Change Data Capture using Kafka
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Building High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafka
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Scaling spark on kubernetes at Lyft
Scaling spark on kubernetes at LyftScaling spark on kubernetes at Lyft
Scaling spark on kubernetes at Lyft
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 

Similar to Apache Con 2021 Structured Data Streaming

Software Development with Open Source
Software Development with Open SourceSoftware Development with Open Source
Software Development with Open Source
OpusVL
 
Cross-platform interaction
Cross-platform interactionCross-platform interaction
Cross-platform interaction
Oleksii Duhno
 
Apache drill
Apache drillApache drill
Apache drill
MapR Technologies
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
gagravarr
 
Analyzing Large-Scale User Data with Hadoop and HBase
Analyzing Large-Scale User Data with Hadoop and HBaseAnalyzing Large-Scale User Data with Hadoop and HBase
Analyzing Large-Scale User Data with Hadoop and HBase
WibiData
 
Composable Software Architecture with Spring
Composable Software Architecture with SpringComposable Software Architecture with Spring
Composable Software Architecture with Spring
Sam Brannen
 
Ballerina- A programming language for the networked world
Ballerina- A programming language for the networked worldBallerina- A programming language for the networked world
Ballerina- A programming language for the networked world
Integration Meetups
 
Ballerina- A programming language for the networked world
Ballerina- A programming language for the networked worldBallerina- A programming language for the networked world
Ballerina- A programming language for the networked world
Asangi Jasenthuliyana
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologies
gagravarr
 
Moving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScaleMoving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScale
mmoline
 
Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3
Wen-Tien Chang
 
StoryCode Tech Immersion 1
StoryCode Tech Immersion 1StoryCode Tech Immersion 1
StoryCode Tech Immersion 1
storycode
 
Be faster then rabbits
Be faster then rabbitsBe faster then rabbits
Be faster then rabbits
Vladislav Bauer
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
Luis Gonzalez
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
BEEVA_es
 
API City 2019 Presentation - Delivering Developer Tools at Scale: Microsoft A...
API City 2019 Presentation - Delivering Developer Tools at Scale: Microsoft A...API City 2019 Presentation - Delivering Developer Tools at Scale: Microsoft A...
API City 2019 Presentation - Delivering Developer Tools at Scale: Microsoft A...
Joe Levy
 
Real world RESTful service development problems and solutions
Real world RESTful service development problems and solutionsReal world RESTful service development problems and solutions
Real world RESTful service development problems and solutions
Masoud Kalali
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summit
Open Analytics
 
Markup languages and warp-speed documentation
Markup languages and warp-speed documentationMarkup languages and warp-speed documentation
Markup languages and warp-speed documentation
Lois Patterson
 
Lois Patterson: Markup Languages and Warp-Speed Documentation
Lois Patterson:  Markup Languages and Warp-Speed DocumentationLois Patterson:  Markup Languages and Warp-Speed Documentation
Lois Patterson: Markup Languages and Warp-Speed Documentation
Jack Molisani
 

Similar to Apache Con 2021 Structured Data Streaming (20)

Software Development with Open Source
Software Development with Open SourceSoftware Development with Open Source
Software Development with Open Source
 
Cross-platform interaction
Cross-platform interactionCross-platform interaction
Cross-platform interaction
 
Apache drill
Apache drillApache drill
Apache drill
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
 
Analyzing Large-Scale User Data with Hadoop and HBase
Analyzing Large-Scale User Data with Hadoop and HBaseAnalyzing Large-Scale User Data with Hadoop and HBase
Analyzing Large-Scale User Data with Hadoop and HBase
 
Composable Software Architecture with Spring
Composable Software Architecture with SpringComposable Software Architecture with Spring
Composable Software Architecture with Spring
 
Ballerina- A programming language for the networked world
Ballerina- A programming language for the networked worldBallerina- A programming language for the networked world
Ballerina- A programming language for the networked world
 
Ballerina- A programming language for the networked world
Ballerina- A programming language for the networked worldBallerina- A programming language for the networked world
Ballerina- A programming language for the networked world
 
Apache Content Technologies
Apache Content TechnologiesApache Content Technologies
Apache Content Technologies
 
Moving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScaleMoving to the Cloud: AWS, Zend, RightScale
Moving to the Cloud: AWS, Zend, RightScale
 
Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3Service-Oriented Design and Implement with Rails3
Service-Oriented Design and Implement with Rails3
 
StoryCode Tech Immersion 1
StoryCode Tech Immersion 1StoryCode Tech Immersion 1
StoryCode Tech Immersion 1
 
Be faster then rabbits
Be faster then rabbitsBe faster then rabbits
Be faster then rabbits
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
 
API City 2019 Presentation - Delivering Developer Tools at Scale: Microsoft A...
API City 2019 Presentation - Delivering Developer Tools at Scale: Microsoft A...API City 2019 Presentation - Delivering Developer Tools at Scale: Microsoft A...
API City 2019 Presentation - Delivering Developer Tools at Scale: Microsoft A...
 
Real world RESTful service development problems and solutions
Real world RESTful service development problems and solutionsReal world RESTful service development problems and solutions
Real world RESTful service development problems and solutions
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summit
 
Markup languages and warp-speed documentation
Markup languages and warp-speed documentationMarkup languages and warp-speed documentation
Markup languages and warp-speed documentation
 
Lois Patterson: Markup Languages and Warp-Speed Documentation
Lois Patterson:  Markup Languages and Warp-Speed DocumentationLois Patterson:  Markup Languages and Warp-Speed Documentation
Lois Patterson: Markup Languages and Warp-Speed Documentation
 

More from Shivji Kumar Jha

Batch to near-realtime: inspired by a real production incident
Batch to near-realtime: inspired by a real production incidentBatch to near-realtime: inspired by a real production incident
Batch to near-realtime: inspired by a real production incident
Shivji Kumar Jha
 
Navigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesNavigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern Databases
Shivji Kumar Jha
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Shivji Kumar Jha
 
osi-oss-dbs.pptx
osi-oss-dbs.pptxosi-oss-dbs.pptx
osi-oss-dbs.pptx
Shivji Kumar Jha
 
pulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptxpulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptx
Shivji Kumar Jha
 
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Shivji Kumar Jha
 
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarPulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Shivji Kumar Jha
 
Pulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for Isolation
Shivji Kumar Jha
 
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarPulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Shivji Kumar Jha
 
Pulsar Summit Asia - Running a secure pulsar cluster
Pulsar Summit Asia -  Running a secure pulsar clusterPulsar Summit Asia -  Running a secure pulsar cluster
Pulsar Summit Asia - Running a secure pulsar cluster
Shivji Kumar Jha
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
Shivji Kumar Jha
 
FOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group ReplicationFOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group Replication
Shivji Kumar Jha
 
MySQL High Availability with Replication New Features
MySQL High Availability with Replication New FeaturesMySQL High Availability with Replication New Features
MySQL High Availability with Replication New Features
Shivji Kumar Jha
 
MySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and ScalabilityMySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and Scalability
Shivji Kumar Jha
 
MySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL ClusterMySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL Cluster
Shivji Kumar Jha
 
MySQL User Camp: GTIDs
MySQL User Camp: GTIDsMySQL User Camp: GTIDs
MySQL User Camp: GTIDs
Shivji Kumar Jha
 
Open source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source ReplicationOpen source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source Replication
Shivji Kumar Jha
 
MySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded SlavesMySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded Slaves
Shivji Kumar Jha
 

More from Shivji Kumar Jha (18)

Batch to near-realtime: inspired by a real production incident
Batch to near-realtime: inspired by a real production incidentBatch to near-realtime: inspired by a real production incident
Batch to near-realtime: inspired by a real production incident
 
Navigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern DatabasesNavigating Transactions: ACID Complexity in Modern Databases
Navigating Transactions: ACID Complexity in Modern Databases
 
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutesDruid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
Druid Summit 2023 : Changing Druid Ingestion from 3 hours to 5 minutes
 
osi-oss-dbs.pptx
osi-oss-dbs.pptxosi-oss-dbs.pptx
osi-oss-dbs.pptx
 
pulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptxpulsar-platformatory-meetup-2.pptx
pulsar-platformatory-meetup-2.pptx
 
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
Pulsar Summit Asia 2022 - Streaming wars and How Apache Pulsar is acing the b...
 
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with PulsarPulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
Pulsar Summit Asia 2022 - Keeping on top of hybrid cloud usage with Pulsar
 
Pulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for Isolation
 
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache PulsarPulsar Summit Asia - Structured Data Stream with Apache Pulsar
Pulsar Summit Asia - Structured Data Stream with Apache Pulsar
 
Pulsar Summit Asia - Running a secure pulsar cluster
Pulsar Summit Asia -  Running a secure pulsar clusterPulsar Summit Asia -  Running a secure pulsar cluster
Pulsar Summit Asia - Running a secure pulsar cluster
 
lessons from managing a pulsar cluster
 lessons from managing a pulsar cluster lessons from managing a pulsar cluster
lessons from managing a pulsar cluster
 
FOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group ReplicationFOSSASIA 2015: MySQL Group Replication
FOSSASIA 2015: MySQL Group Replication
 
MySQL High Availability with Replication New Features
MySQL High Availability with Replication New FeaturesMySQL High Availability with Replication New Features
MySQL High Availability with Replication New Features
 
MySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and ScalabilityMySQL Developer Day conference: MySQL Replication and Scalability
MySQL Developer Day conference: MySQL Replication and Scalability
 
MySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL ClusterMySQL User Camp: MySQL Cluster
MySQL User Camp: MySQL Cluster
 
MySQL User Camp: GTIDs
MySQL User Camp: GTIDsMySQL User Camp: GTIDs
MySQL User Camp: GTIDs
 
Open source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source ReplicationOpen source India - MySQL Labs: Multi-Source Replication
Open source India - MySQL Labs: Multi-Source Replication
 
MySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded SlavesMySQL User Camp: Multi-threaded Slaves
MySQL User Camp: Multi-threaded Slaves
 

Recently uploaded

20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
Matthew Sinclair
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
ScyllaDB
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
Vijayananda Mohire
 
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
kantakumariji156
 
What's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdfWhat's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdf
SeasiaInfotech2
 
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Earley Information Science
 
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
kantakumariji156
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
Aurora Consulting
 
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
uuuot
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
ScyllaDB
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
ScyllaDB
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
ScyllaDB
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
Stephanie Beckett
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
Emerging Tech
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
Liveplex
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
Matthew Sinclair
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
Stephanie Beckett
 

Recently uploaded (20)

20240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 202420240702 QFM021 Machine Intelligence Reading List June 2024
20240702 QFM021 Machine Intelligence Reading List June 2024
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
 
Quantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLMQuantum Communications Q&A with Gemini LLM
Quantum Communications Q&A with Gemini LLM
 
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...@Call @Girls Thiruvananthapuram  🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
@Call @Girls Thiruvananthapuram 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cu...
 
What's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdfWhat's Next Web Development Trends to Watch.pdf
What's Next Web Development Trends to Watch.pdf
 
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design ApproachesKnowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
Knowledge and Prompt Engineering Part 2 Focus on Prompt Design Approaches
 
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
@Call @Girls Guwahati 🚒 XXXXXXXXXX 🚒 Priya Sharma Beautiful And Cute Girl any...
 
Quality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of TimeQuality Patents: Patents That Stand the Test of Time
Quality Patents: Patents That Stand the Test of Time
 
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
一比一原版(msvu毕业证书)圣文森山大学毕业证如何办理
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
 
How Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global ScaleHow Netflix Builds High Performance Applications at Global Scale
How Netflix Builds High Performance Applications at Global Scale
 
Running a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU ImpactsRunning a Go App in Kubernetes: CPU Impacts
Running a Go App in Kubernetes: CPU Impacts
 
What's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptxWhat's New in Copilot for Microsoft365 May 2024.pptx
What's New in Copilot for Microsoft365 May 2024.pptx
 
Implementations of Fused Deposition Modeling in real world
Implementations of Fused Deposition Modeling  in real worldImplementations of Fused Deposition Modeling  in real world
Implementations of Fused Deposition Modeling in real world
 
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALLBLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
BLOCKCHAIN FOR DUMMIES: GUIDEBOOK FOR ALL
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
 
What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024What’s New in Teams Calling, Meetings and Devices May 2024
What’s New in Teams Calling, Meetings and Devices May 2024
 

Apache Con 2021 Structured Data Streaming

  • 1. STRUCTURED DATA STREAMING SHIVJI KUMAR JHA @ShivjiJha TRACK: STREAMING in/shivjijha
  • 2. About Me • Senior MTS at Nutanix • Platform Engineer – DBs, SOA, Infra, Streams • Love – Distributed data systems – Open-source software (OSS) • OSS Contributions – Apache Pulsar – MySQL
  • 5. A Brief History… Of Databases • 1960: Flat Files • 1960s: Hierarchical Databases – Need for structure • 1980: SQL / Relational Databases – High-level language – Some more structure! • 2004: NoSQL – Scale & Availability above all – No relational model, less structure • 2010s: Distributed SQL – Well, no, we need structure* Image source: https://commons.wikimedia.org/wiki/File:Human_evolution.svg
  • 6. A Brief History… Of Data Streams • Apache Kafka: – Built inside LinkedIn – 2011: Kafka becomes open source – 2012: Graduated from Apache incubator • Apache Pulsar – Built at Yahoo – 2016: Contributed to Open source – 2018: Top-level Apache project Image source: https://commons.wikimedia.org/wiki/File:Human_evolution.svg
  • 7. History tell us… Evolution • SQL -> NoSQL -> Distributed SQL – Relation database has strict schema • Streaming bytes -> Schema Registry – Both Kafka and Pulsar support schema registry now! – Its not ideal to stream bytes • Use schema wherever possible – Err on the side of having schema OBJECTS PRIMITIVE TYPES BYTES BITS
  • 11. Computers only know bits… Encoding data Example: Write an employee record • Sending data to a computer – Local computer – Over network • Can't send as is. • Encode to bits – Also, serialization • Send https://www.raywenderlich.com/books/swift-apprentice/v6.0/chapters/22-encoding-decoding-types
  • 12. Computers only know bits… Decoding data Example: Read an employee record • Read data from a computer – Local computer – Over network • Turn bytes to employee rocrd • Decode to bits – Also, de-serialization • Use in program. https://www.raywenderlich.com/books/swift-apprentice/v6.0/chapters/22-encoding-decoding-types
  • 13. Encoder / decoder placements Produce Data Consume Data • Choice 1 : App
  • 14. Encoder / decoder placements Produce Data Consume Data • Choice 2 : Producer/Consumer
  • 15. Encoder / decoder placements Produce Data Consume Data • Choice 3 : Stream Platform
  • 16. Abstraction vs Flexibility Abstraction • Abstract out encode / decode to the stream platform • Lighter Individual apps – Single Responsibility Principle • Easy Evolution of schema – Versioning • Less bugs! Flexibility • Keep encode / decode in (each) app • Flexibility of choice – Schema formats • Schema evolution is hard – Versioning – Upgrade Path? • More bugs?
  • 18. Flexibility? : Choose Wisely… • Flexibility is a good choice when: – Non-uniform data • Custom Fields • Non-uniform Types – Frequent schema migrations
  • 20. Schema : Choice 1 Use Native serialization of programming language • Examples: – Java serialization – Python’s pickle – Ruby’s Marshall • Good – Easy implementation • Bad – Locked with same programming language for producer and consumer – Difficult to Evolve schema versions • Upgrade Path?
  • 21. Schema : Choice 2 Use same format as web APIs (REST?) • Examples: – JSON – XML • Good – Familiar implementation, share code! – Text, readable, easy to debug • Bad – Key Name in every message, too much data – Auto detected type, may go wrong… – New types? Nested types? Ship POJO library? • Document? Synchronize? Ignore new data?
  • 22. Schema : Choice 3 Struct Schema : Avro, Thrift, Protocol Buffers • Good – Binary formats, less space – Matured over the years – Well documented – Libraries in multiple languages – Good support in stream ecosystem – Evolution with versioning • Bad – Extra learning curve
  • 24. Apache Pulsar 101 PRODUCER CONSUMER • Cloud-native, • Distributed messaging and • Distributed streaming platform Apache Pulsar • Modular Design • Horizontally scalable • Low latency & high throughput • Multi-tenancy • Geo Replication Highlights
  • 25. Pulsar schema : Byte Schema Domain Object Pulsar Producer : Sample Code
  • 26. Pulsar schema : String Schema Producer Consumer
  • 27. Pulsar schema : Avro Schema
  • 28. Pulsar schema : Schema Registry • Topic to schemas mapping. • Stores accepted schemas for a topic. • Manages evolution with versioning. • Producer adds schema, if compatible. • Consumer fetches schema, given topic & message. • Schema => [name, payload, type, properties]
  • 29. Schema Evolution • Manual – Check every schema before upgrade • Auto – updates – If new schema passes compatibility tests, producer uploads new version of schema.
  • 34. Pulsar schema : Schema Registry AUTO_PRODUCE • Validates whether the bytes sent is compatible • If not, rejects. Produce<byte[]> pulsarProducer = client.newProducer (Schema.AUTO_PRODUCE()) … .create(); AUTO_CONSUME • Validate whether bytes sent from topic is compatible with schemas on topic. • If not, rejects. Consumer<GenericRecord> pulsarConsumer = client.newConsumer(Schema.AUTO_CONSUME ()) … .subscribe();
  • 35. Topic Schema Mapping • Topic : schema = 1:1 ? • What about relative ordering? • Opinion: – Model domain to topic – Domain may have multiple schema. • Example : User, accounts, subscription – Prefer relative ordering – Work with parallel evolution • User v1 -> User V2 -> User V3 • Account v1 -> Account V2 -> Account V3 • Subscription v1 -> Subscription V2 -> Subscription V3
  • 36. Schema across pipeline • Pulsar IO – Source (Examples : Flink, Spark, Elasticsearch) – Pulsar – Sink (Examples : Flink, Spark, Elasticsearch) • Same schema across pipeline – Unless you decorate.. – Unless different format for optimization • Type of query
  • 38. Learnings over the years • Struct schemas model domain objects well. • Binary representation is space efficient. • Use schemas management on apps, only if you need that extra flexibility. • Use schema registry by default. • Recommend Avro – Json schema – a bit too verbose – Proto awesome, not adopted well among sources / sinks. – Avro is adopted really well. • Decide and set compatibility / evolution rules. Worth it!
  • 39. References 1. Pulsar docs: https://pulsar.apache.org/docs/en/schema-get- started/ 2. Schema auto update strategy: https://pulsar.apache.org/docs/en/pulsar-admin/#set-schema- autoupdate-strategy 3. Schema Evolution in Avro, Thrift, Protobuff: https://martin.kleppmann.com/2012/12/05/schema- evolution-in-avro-protocol-buffers-thrift.html 4. Topic design per domain: https://www.confluent.io/blog/put- several-event-types-kafka-topic/ 5. Schema Compatibility Design: https://docs.confluent.io/platform/current/schema- registry/avro.html#compatibility-types