SlideShare a Scribd company logo
Himani Arora & Prabhat Kashyap
Software Consultant
@_himaniarora @pk_official
Who we are?
Himani Arora
@_himaniarora
Software Consultant @ Knoldus Software LLP
Contributed in Apache Kafka, Juypter,
Apache CarbonData, Lightbend Lagom etc
Currently learning Apache Kafka
Prabhat Kashyap
@pk_official
Software Consultant @ Knoldus Software LLP
Contributed in Apache Kafka and Apache
CarbonData and Lightbend Templates
Currently learning Apache Kafka
Agenda
●
What is Stream processing
●
Paradigms of programming
●
Stream Processing with Kafka
●
What are Kafka Streams
●
Inside Kafka Streams
●
Demonstration of stream processing using Kafka Streams
●
Overview of Kafka Connect
●
Demo with Kafka Connect
What is stream processing?
● Real-time processing of data
● Does not treat data as static tables or files
● Data has to be processed fast, so that a firm can react to
changing business conditions in real time. This is required
for trading, fraud detection, system monitoring, and many
other examples.
● A “too late architecture” cannot realize these use cases.
BIG DATA VERSUS FAST DATA
3 PARADIGMS OF PROGRAMMING
● REQUEST/RESPONSE
● BATCH SYSTEMS
● STREAM PROCESSING
REQUEST/RESPONSE
BATCH SYSTEM
STREAM PROCESSING
STREAM PROCESSING with KAFKA
2 APPROACHES:
● DO IT YOURSELF (DIY ! ) STREAM PROCESSING
● STREAM PROCESSING FRAMEWORK
DIY STREAM PROCESSING
Major Challenges:
● FAULT TOLERANCE
● PARTITIONING AND SCALABILITY
● TIME
● STATE
● REPROCESSING
STREAM PROCESSING FRAMEWORK
Many already available stream processing framework are:
SPARK
STORM
SAMZA
FLINK ETC...
KAFKA STREAMS : ANOTHER WAY OF STREAM PROCESSING
Let’s starts with Kafka Stream but wait.. What is KAFKA?
Hello! Apache Kafka
● Apache Kafka is an Open Source project under Apache Licence
2.0
● Apache Kafka was originally developed by LinkedIn.
● On 23 October 2012 Apache Kafka graduated from incubator to
top level projects.
● Components of Apache Kafka
○ Producer
○ Consumer
○ Broker
○ Topic
○ Data
○ Parallelism
Stream processing using Kafka
Enterprises that use Kafka
What is Kafka Streams
● It is Streams API of Apache Kafka, available through a Java library.
● Kafka Streams is built on top of functionality provided by Kafka’s.
● It is , by deliberate design, tightly integrated with Apache Kafka.
● It can be used to build highly scalable, elastic, fault-tolerant, distributed
applications and microservices.
● Kafka Streams API allows you to create real-time applications.
● It is the easiest yet the most powerful technology to process data stored
in Kafka.
Stream processing using Kafka
If we look closer
● A key motivation of the Kafka Streams API is to bring stream processing out of
the Big Data niche into the world of mainstream application development.
● Using the Kafka Streams API you can implement standard Java applications to
solve your stream processing needs.
● Your applications are fully elastic: you can run one or more instances of your
application.
● This lightweight and integrative approach of the Kafka Streams API – “Build
applications, not infrastructure!” .
● Deployment-wise you are free to chose from any technology that can deploy Java
applications
Capabilities of Kafka Stream
● Powerful
○ Makes your applications highly scalable, elastic, distributed, fault-
tolerant.
○ Stateful and stateless processing
○ Event-time processing with windowing, joins, aggregations
● Lightweight
○ Low barrier to entry
○ No processing cluster required
○ No external dependencies other than Apache Kafka
Capabilities of Kafka Stream
● Real-time
○ Millisecond processing latency
○ Record-at-a-time processing (no micro-batching)
○ Seamlessly handles late-arriving and out-of-order data
○ High throughput
● Fully integrated
○ 100% compatible with Apache Kafka 0.10.2 and 0.10.1
○ Easy to integrate into existing applications and microservices
○ Runs everywhere: on-premises, public clouds, private clouds, containers, etc.
○ Integrates with databases through continous change data capture (CDC) performed by
Kafka Connect
Key concepts of Kafka Streams
● Stateful Stream Processing
● KStream
● KTable
● Time
● Aggregations
● Joins
● Windowing
Key concepts of Kafka Streams
● Stateful Stream Processing
– Some stream processing applications don’t require state – they
are stateless.
– In practice, however, most applications require state – they are
stateful.
– The state must be managed in a fault-tolerant manner.
– Application is stateful whenever, for example, it needs to join,
aggregate, or window its input data.
Key concepts of Kafka Streams
● Kstream
– A KStream is an abstraction of a record stream.
– Each data record represents a self-contained datum in the
unbounded data set.
– Using the table analogy, data records in a record stream are
always interpreted as an “INSERT” .
– Let’s imagine the following two data records are being sent to
the stream:
("alice", 1) --> ("alice", 3)
Key concepts of Kafka Streams
● Ktable
– A KStream is an abstraction of a changelog stream.
– Each data record represents an update.
– Using the table analogy, data records in a record stream are
always interpreted as an “UPDATE” .
– Let’s imagine the following two data records are being sent to
the stream:
("alice", 1) --> ("alice", 3)
Key concepts of Kafka Streams
● Time
– A critical aspect in stream processing is the the notion of time.
– Kafka Streams supports the following notions of time:
●
Event Time
●
Processing Time
●
Ingestion Time
– Kafka Streams assigns a timestamp to every data record via
so-called timestamp extractors.
Key concepts of Kafka Streams
● Aggregations
– An aggregation operation takes one input stream or table, and
yields a new table.
– It is done by combining multiple input records into a single
output record.
– In the Kafka Streams DSL, an input stream of an aggregation
operation can be a KStream or a KTable, but the output
stream will always be a KTable.
Key concepts of Kafka Streams
● Joins
– A join operation merges two input streams and/or tables based
on the keys of their data records, and yields a new
stream/table.
Key concepts of Kafka Streams
● Windowing
– Windowing lets you control how to group records that have the same
key for stateful operations such as aggregations or joins into so-
called windows.
– Windows are tracked per record key.
– When working with windows, you can specify a retention period for
the window.
– This retention period controls how long Kafka Streams will wait for
out-of-order or late-arriving data records for a given window.
– If a record arrives after the retention period of a window has passed,
the record is discarded and will not be processed in that window.
Inside Kafka Stream
Processor Topology
Stream Partitions and Tasks
● Each stream partition is a totally ordered sequence of data records and
maps to a Kafka topic partition.
● A data record in the stream maps to a Kafka message from that topic.
● The keys of data records determine the partitioning of data in both Kafka
and Kafka Streams, i.e., how data is routed to specific partitions within
topics.
Threading Model
● Kafka Streams allows the user to configure the number of threads that
the library can use to parallelize processing within an application
instance.
● Each thread can execute one or more stream tasks with their processor
topologies independently.
State
● Kafka Streams provides so-called state stores.
● State can be used by stream processing applications to store and query
data, which is an important capability when implementing stateful
operations.
Backpressure
● Kafka Streams does not use a backpressure mechanism because it
does not need one.
● It uses depth-first processing strategy.
● Each record consumed from Kafka will go through the whole processor
(sub-)topology for processing and for (possibly) being written back to
Kafka before the next record will be processed.
● No records are being buffered in-memory between two connected
stream processors.
● Kafka Streams leverages Kafka’s consumer client behind the scenes.
DEMO
Kafka Streams
HOW TO GET DATA IN AND OUT OF KAFKA?
KAFKA CONNECT
Kafka connect
● So-called Sources import data into Kafka, and Sinks export data from
Kafka.
● An implementation of a Source or Sink is a Connector. And users deploy
connectors to enable data flows on Kafka
● All Kafka Connect sources and sinks map to partitioned streams of
records.
● This is a generalization of Kafka’s concept of topic partitions: a stream
refers to the complete set of records that are split into independent
infinite sequences of records
CONFIGURING CONNECTORS
● Connector configurations are key-value mappings.
● For standalone mode these are defined in a properties file and
passed to the Connect process on the command line.
● In distributed mode, they will be included in the JSON payload
sent over the REST API for the request that creates the connector.
CONFIGURING CONNECTORS
Few settings common that are common to all connectors:
● name - Unique name for the connector. Attempting to register again
with the same name will fail.
● connector.class - The Java class for the connector
● tasks.max - The maximum number of tasks that should be created for
this connector. The connector may create fewer tasks if it cannot
achieve this level of parallelism.
REFERENCES
●
https://www.slideshare.net/ConfluentInc/demystifying-stream-processing-with-apache-kafka-
69228952
●
https://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/
●
http://docs.confluent.io/3.2.0/streams/index.html
●
http://docs.confluent.io/3.2.0/connect/index.html
Thank You

More Related Content

What's hot

From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
confluent
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
AIMDek Technologies
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Long Nguyen
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
emreakis
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Clement Demonchy
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Shiao-An Yuan
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
Knoldus Inc.
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
confluent
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
Ketan Gote
 
kafka
kafkakafka
An Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a ServiceAn Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a Service
confluent
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
confluent
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
Martin Podval
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
Timothy Spann
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Apache kafkaApache kafka
Apache kafka
Viswanath J
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
Jun Rao
 

What's hot (20)

From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
kafka
kafkakafka
kafka
 
An Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a ServiceAn Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a Service
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 

Similar to Stream processing using Kafka

BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
Cedric Vidal
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
Slim Baltagi
 
Connecting kafka message systems with scylla
Connecting kafka message systems with scylla   Connecting kafka message systems with scylla
Connecting kafka message systems with scylla
Maheedhar Gunturu
 
Alpakka - Connecting Kafka and ElasticSearch to Akka Streams
Alpakka - Connecting Kafka and ElasticSearch to Akka StreamsAlpakka - Connecting Kafka and ElasticSearch to Akka Streams
Alpakka - Connecting Kafka and ElasticSearch to Akka Streams
Knoldus Inc.
 
Apache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataApache frameworks for Big and Fast Data
Apache frameworks for Big and Fast Data
Naveen Korakoppa
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
Monal Daxini
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Data Con LA
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
Slim Baltagi
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
Joe Stein
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
datamantra
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
DataWorks Summit
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
Change data capture
Change data captureChange data capture
Change data capture
Ron Barabash
 
Stream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaStream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and Kafka
Itai Yaffe
 
Apache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected Talks
Andrii Gakhov
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
Timothy Spann
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
confluent
 
Apache Kafka Streams
Apache Kafka StreamsApache Kafka Streams
Apache Kafka Streams
Apache Kafka TLV
 

Similar to Stream processing using Kafka (20)

BBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.comBBL KAPPA Lesfurets.com
BBL KAPPA Lesfurets.com
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
Connecting kafka message systems with scylla
Connecting kafka message systems with scylla   Connecting kafka message systems with scylla
Connecting kafka message systems with scylla
 
Alpakka - Connecting Kafka and ElasticSearch to Akka Streams
Alpakka - Connecting Kafka and ElasticSearch to Akka StreamsAlpakka - Connecting Kafka and ElasticSearch to Akka Streams
Alpakka - Connecting Kafka and ElasticSearch to Akka Streams
 
Apache frameworks for Big and Fast Data
Apache frameworks for Big and Fast DataApache frameworks for Big and Fast Data
Apache frameworks for Big and Fast Data
 
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
AWS Re-Invent 2017 Netflix Keystone SPaaS - Monal Daxini - Abd320 2017
 
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...Building streaming data applications using Kafka*[Connect + Core + Streams] b...
Building streaming data applications using Kafka*[Connect + Core + Streams] b...
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Structured Streaming with Kafka
Structured Streaming with KafkaStructured Streaming with Kafka
Structured Streaming with Kafka
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Change data capture
Change data captureChange data capture
Change data capture
 
Stream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and KafkaStream, stream, stream: Different streaming methods with Spark and Kafka
Stream, stream, stream: Different streaming methods with Spark and Kafka
 
Apache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected Talks
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
 
Apache Kafka Streams
Apache Kafka StreamsApache Kafka Streams
Apache Kafka Streams
 

More from Knoldus Inc.

Angular Hydration Presentation (FrontEnd)
Angular Hydration Presentation (FrontEnd)Angular Hydration Presentation (FrontEnd)
Angular Hydration Presentation (FrontEnd)
Knoldus Inc.
 
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Optimizing Test Execution: Heuristic Algorithm for Self-HealingOptimizing Test Execution: Heuristic Algorithm for Self-Healing
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Knoldus Inc.
 
Self-Healing Test Automation Framework - Healenium
Self-Healing Test Automation Framework - HealeniumSelf-Healing Test Automation Framework - Healenium
Self-Healing Test Automation Framework - Healenium
Knoldus Inc.
 
Kanban Metrics Presentation (Project Management)
Kanban Metrics Presentation (Project Management)Kanban Metrics Presentation (Project Management)
Kanban Metrics Presentation (Project Management)
Knoldus Inc.
 
Java 17 features and implementation.pptx
Java 17 features and implementation.pptxJava 17 features and implementation.pptx
Java 17 features and implementation.pptx
Knoldus Inc.
 
Chaos Mesh Introducing Chaos in Kubernetes
Chaos Mesh Introducing Chaos in KubernetesChaos Mesh Introducing Chaos in Kubernetes
Chaos Mesh Introducing Chaos in Kubernetes
Knoldus Inc.
 
GraalVM - A Step Ahead of JVM Presentation
GraalVM - A Step Ahead of JVM PresentationGraalVM - A Step Ahead of JVM Presentation
GraalVM - A Step Ahead of JVM Presentation
Knoldus Inc.
 
Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)
Knoldus Inc.
 
DAPR - Distributed Application Runtime Presentation
DAPR - Distributed Application Runtime PresentationDAPR - Distributed Application Runtime Presentation
DAPR - Distributed Application Runtime Presentation
Knoldus Inc.
 
Introduction to Azure Virtual WAN Presentation
Introduction to Azure Virtual WAN PresentationIntroduction to Azure Virtual WAN Presentation
Introduction to Azure Virtual WAN Presentation
Knoldus Inc.
 
Introduction to Argo Rollouts Presentation
Introduction to Argo Rollouts PresentationIntroduction to Argo Rollouts Presentation
Introduction to Argo Rollouts Presentation
Knoldus Inc.
 
Intro to Azure Container App Presentation
Intro to Azure Container App PresentationIntro to Azure Container App Presentation
Intro to Azure Container App Presentation
Knoldus Inc.
 
Insights Unveiled Test Reporting and Observability Excellence
Insights Unveiled Test Reporting and Observability ExcellenceInsights Unveiled Test Reporting and Observability Excellence
Insights Unveiled Test Reporting and Observability Excellence
Knoldus Inc.
 
Introduction to Splunk Presentation (DevOps)
Introduction to Splunk Presentation (DevOps)Introduction to Splunk Presentation (DevOps)
Introduction to Splunk Presentation (DevOps)
Knoldus Inc.
 
Code Camp - Data Profiling and Quality Analysis Framework
Code Camp - Data Profiling and Quality Analysis FrameworkCode Camp - Data Profiling and Quality Analysis Framework
Code Camp - Data Profiling and Quality Analysis Framework
Knoldus Inc.
 
AWS: Messaging Services in AWS Presentation
AWS: Messaging Services in AWS PresentationAWS: Messaging Services in AWS Presentation
AWS: Messaging Services in AWS Presentation
Knoldus Inc.
 
Amazon Cognito: A Primer on Authentication and Authorization
Amazon Cognito: A Primer on Authentication and AuthorizationAmazon Cognito: A Primer on Authentication and Authorization
Amazon Cognito: A Primer on Authentication and Authorization
Knoldus Inc.
 
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
ZIO Http A Functional Approach to Scalable and Type-Safe Web DevelopmentZIO Http A Functional Approach to Scalable and Type-Safe Web Development
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Knoldus Inc.
 
Managing State & HTTP Requests In Ionic.
Managing State & HTTP Requests In Ionic.Managing State & HTTP Requests In Ionic.
Managing State & HTTP Requests In Ionic.
Knoldus Inc.
 

More from Knoldus Inc. (20)

Angular Hydration Presentation (FrontEnd)
Angular Hydration Presentation (FrontEnd)Angular Hydration Presentation (FrontEnd)
Angular Hydration Presentation (FrontEnd)
 
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Optimizing Test Execution: Heuristic Algorithm for Self-HealingOptimizing Test Execution: Heuristic Algorithm for Self-Healing
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
 
Self-Healing Test Automation Framework - Healenium
Self-Healing Test Automation Framework - HealeniumSelf-Healing Test Automation Framework - Healenium
Self-Healing Test Automation Framework - Healenium
 
Kanban Metrics Presentation (Project Management)
Kanban Metrics Presentation (Project Management)Kanban Metrics Presentation (Project Management)
Kanban Metrics Presentation (Project Management)
 
Java 17 features and implementation.pptx
Java 17 features and implementation.pptxJava 17 features and implementation.pptx
Java 17 features and implementation.pptx
 
Chaos Mesh Introducing Chaos in Kubernetes
Chaos Mesh Introducing Chaos in KubernetesChaos Mesh Introducing Chaos in Kubernetes
Chaos Mesh Introducing Chaos in Kubernetes
 
GraalVM - A Step Ahead of JVM Presentation
GraalVM - A Step Ahead of JVM PresentationGraalVM - A Step Ahead of JVM Presentation
GraalVM - A Step Ahead of JVM Presentation
 
Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)
 
Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)
 
DAPR - Distributed Application Runtime Presentation
DAPR - Distributed Application Runtime PresentationDAPR - Distributed Application Runtime Presentation
DAPR - Distributed Application Runtime Presentation
 
Introduction to Azure Virtual WAN Presentation
Introduction to Azure Virtual WAN PresentationIntroduction to Azure Virtual WAN Presentation
Introduction to Azure Virtual WAN Presentation
 
Introduction to Argo Rollouts Presentation
Introduction to Argo Rollouts PresentationIntroduction to Argo Rollouts Presentation
Introduction to Argo Rollouts Presentation
 
Intro to Azure Container App Presentation
Intro to Azure Container App PresentationIntro to Azure Container App Presentation
Intro to Azure Container App Presentation
 
Insights Unveiled Test Reporting and Observability Excellence
Insights Unveiled Test Reporting and Observability ExcellenceInsights Unveiled Test Reporting and Observability Excellence
Insights Unveiled Test Reporting and Observability Excellence
 
Introduction to Splunk Presentation (DevOps)
Introduction to Splunk Presentation (DevOps)Introduction to Splunk Presentation (DevOps)
Introduction to Splunk Presentation (DevOps)
 
Code Camp - Data Profiling and Quality Analysis Framework
Code Camp - Data Profiling and Quality Analysis FrameworkCode Camp - Data Profiling and Quality Analysis Framework
Code Camp - Data Profiling and Quality Analysis Framework
 
AWS: Messaging Services in AWS Presentation
AWS: Messaging Services in AWS PresentationAWS: Messaging Services in AWS Presentation
AWS: Messaging Services in AWS Presentation
 
Amazon Cognito: A Primer on Authentication and Authorization
Amazon Cognito: A Primer on Authentication and AuthorizationAmazon Cognito: A Primer on Authentication and Authorization
Amazon Cognito: A Primer on Authentication and Authorization
 
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
ZIO Http A Functional Approach to Scalable and Type-Safe Web DevelopmentZIO Http A Functional Approach to Scalable and Type-Safe Web Development
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
 
Managing State & HTTP Requests In Ionic.
Managing State & HTTP Requests In Ionic.Managing State & HTTP Requests In Ionic.
Managing State & HTTP Requests In Ionic.
 

Recently uploaded

The Politics of Agile Development.pptx
The  Politics of  Agile Development.pptxThe  Politics of  Agile Development.pptx
The Politics of Agile Development.pptx
NMahendiran
 
Unlocking the Future of Artificial Intelligence
Unlocking the Future of Artificial IntelligenceUnlocking the Future of Artificial Intelligence
Unlocking the Future of Artificial Intelligence
dorinIonescu
 
UW Cert degree offer diploma
UW Cert degree offer diploma UW Cert degree offer diploma
UW Cert degree offer diploma
dakyuhe
 
The two flavors of Python 3.13 - PyHEP 2024
The two flavors of Python 3.13 - PyHEP 2024The two flavors of Python 3.13 - PyHEP 2024
The two flavors of Python 3.13 - PyHEP 2024
Henry Schreiner
 
New York University degree Cert offer diploma Transcripta
New York University degree Cert offer diploma Transcripta New York University degree Cert offer diploma Transcripta
New York University degree Cert offer diploma Transcripta
pyxgy
 
BDRSuite - #1 Cost effective Data Backup and Recovery Solution
BDRSuite - #1 Cost effective Data Backup and Recovery SolutionBDRSuite - #1 Cost effective Data Backup and Recovery Solution
BDRSuite - #1 Cost effective Data Backup and Recovery Solution
praveene26
 
06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching
quanhoangd129
 
Mastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GISMastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GIS
Safe Software
 
B.Sc. Computer Science Department PPT 2024
B.Sc. Computer Science Department PPT 2024B.Sc. Computer Science Department PPT 2024
B.Sc. Computer Science Department PPT 2024
vmsdeptcom
 
Understanding Automated Testing Tools for Web Applications.pdf
Understanding Automated Testing Tools for Web Applications.pdfUnderstanding Automated Testing Tools for Web Applications.pdf
Understanding Automated Testing Tools for Web Applications.pdf
kalichargn70th171
 
daily-improvements-with-sqdc-process.pdf
daily-improvements-with-sqdc-process.pdfdaily-improvements-with-sqdc-process.pdf
daily-improvements-with-sqdc-process.pdf
sayma33
 
Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...
Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...
Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...
Alluxio, Inc.
 
Fantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdfFantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdf
6m9p7qnjj8
 
Learning Rust with Advent of Code 2023 - Princeton
Learning Rust with Advent of Code 2023 - PrincetonLearning Rust with Advent of Code 2023 - Princeton
Learning Rust with Advent of Code 2023 - Princeton
Henry Schreiner
 
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Andre Hora
 
Predicting Test Results without Execution (FSE 2024)
Predicting Test Results without Execution (FSE 2024)Predicting Test Results without Execution (FSE 2024)
Predicting Test Results without Execution (FSE 2024)
Andre Hora
 
08. Ruby Enumerable - Ruby Core Teaching
08. Ruby Enumerable - Ruby Core Teaching08. Ruby Enumerable - Ruby Core Teaching
08. Ruby Enumerable - Ruby Core Teaching
quanhoangd129
 
BitLocker Data Recovery | BLR Tools Data Recovery Solutions
BitLocker Data Recovery | BLR Tools Data Recovery SolutionsBitLocker Data Recovery | BLR Tools Data Recovery Solutions
BitLocker Data Recovery | BLR Tools Data Recovery Solutions
Alina Tait
 
What is Micro Frontends and Why Use it.pdf
What is Micro Frontends and Why Use it.pdfWhat is Micro Frontends and Why Use it.pdf
What is Micro Frontends and Why Use it.pdf
lead93317
 
Three available editions of Windows Servers crucial to your organization’s op...
Three available editions of Windows Servers crucial to your organization’s op...Three available editions of Windows Servers crucial to your organization’s op...
Three available editions of Windows Servers crucial to your organization’s op...
Q-Advise
 

Recently uploaded (20)

The Politics of Agile Development.pptx
The  Politics of  Agile Development.pptxThe  Politics of  Agile Development.pptx
The Politics of Agile Development.pptx
 
Unlocking the Future of Artificial Intelligence
Unlocking the Future of Artificial IntelligenceUnlocking the Future of Artificial Intelligence
Unlocking the Future of Artificial Intelligence
 
UW Cert degree offer diploma
UW Cert degree offer diploma UW Cert degree offer diploma
UW Cert degree offer diploma
 
The two flavors of Python 3.13 - PyHEP 2024
The two flavors of Python 3.13 - PyHEP 2024The two flavors of Python 3.13 - PyHEP 2024
The two flavors of Python 3.13 - PyHEP 2024
 
New York University degree Cert offer diploma Transcripta
New York University degree Cert offer diploma Transcripta New York University degree Cert offer diploma Transcripta
New York University degree Cert offer diploma Transcripta
 
BDRSuite - #1 Cost effective Data Backup and Recovery Solution
BDRSuite - #1 Cost effective Data Backup and Recovery SolutionBDRSuite - #1 Cost effective Data Backup and Recovery Solution
BDRSuite - #1 Cost effective Data Backup and Recovery Solution
 
06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching06. Ruby Array & Hash - Ruby Core Teaching
06. Ruby Array & Hash - Ruby Core Teaching
 
Mastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GISMastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GIS
 
B.Sc. Computer Science Department PPT 2024
B.Sc. Computer Science Department PPT 2024B.Sc. Computer Science Department PPT 2024
B.Sc. Computer Science Department PPT 2024
 
Understanding Automated Testing Tools for Web Applications.pdf
Understanding Automated Testing Tools for Web Applications.pdfUnderstanding Automated Testing Tools for Web Applications.pdf
Understanding Automated Testing Tools for Web Applications.pdf
 
daily-improvements-with-sqdc-process.pdf
daily-improvements-with-sqdc-process.pdfdaily-improvements-with-sqdc-process.pdf
daily-improvements-with-sqdc-process.pdf
 
Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...
Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...
Alluxio Webinar | What’s new in Alluxio Enterprise AI 3.2: Leverage GPU Anywh...
 
Fantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdfFantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdf
 
Learning Rust with Advent of Code 2023 - Princeton
Learning Rust with Advent of Code 2023 - PrincetonLearning Rust with Advent of Code 2023 - Princeton
Learning Rust with Advent of Code 2023 - Princeton
 
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
 
Predicting Test Results without Execution (FSE 2024)
Predicting Test Results without Execution (FSE 2024)Predicting Test Results without Execution (FSE 2024)
Predicting Test Results without Execution (FSE 2024)
 
08. Ruby Enumerable - Ruby Core Teaching
08. Ruby Enumerable - Ruby Core Teaching08. Ruby Enumerable - Ruby Core Teaching
08. Ruby Enumerable - Ruby Core Teaching
 
BitLocker Data Recovery | BLR Tools Data Recovery Solutions
BitLocker Data Recovery | BLR Tools Data Recovery SolutionsBitLocker Data Recovery | BLR Tools Data Recovery Solutions
BitLocker Data Recovery | BLR Tools Data Recovery Solutions
 
What is Micro Frontends and Why Use it.pdf
What is Micro Frontends and Why Use it.pdfWhat is Micro Frontends and Why Use it.pdf
What is Micro Frontends and Why Use it.pdf
 
Three available editions of Windows Servers crucial to your organization’s op...
Three available editions of Windows Servers crucial to your organization’s op...Three available editions of Windows Servers crucial to your organization’s op...
Three available editions of Windows Servers crucial to your organization’s op...
 

Stream processing using Kafka

  • 1. Himani Arora & Prabhat Kashyap Software Consultant @_himaniarora @pk_official
  • 2. Who we are? Himani Arora @_himaniarora Software Consultant @ Knoldus Software LLP Contributed in Apache Kafka, Juypter, Apache CarbonData, Lightbend Lagom etc Currently learning Apache Kafka Prabhat Kashyap @pk_official Software Consultant @ Knoldus Software LLP Contributed in Apache Kafka and Apache CarbonData and Lightbend Templates Currently learning Apache Kafka
  • 3. Agenda ● What is Stream processing ● Paradigms of programming ● Stream Processing with Kafka ● What are Kafka Streams ● Inside Kafka Streams ● Demonstration of stream processing using Kafka Streams ● Overview of Kafka Connect ● Demo with Kafka Connect
  • 4. What is stream processing? ● Real-time processing of data ● Does not treat data as static tables or files ● Data has to be processed fast, so that a firm can react to changing business conditions in real time. This is required for trading, fraud detection, system monitoring, and many other examples. ● A “too late architecture” cannot realize these use cases.
  • 5. BIG DATA VERSUS FAST DATA
  • 6. 3 PARADIGMS OF PROGRAMMING ● REQUEST/RESPONSE ● BATCH SYSTEMS ● STREAM PROCESSING
  • 10. STREAM PROCESSING with KAFKA 2 APPROACHES: ● DO IT YOURSELF (DIY ! ) STREAM PROCESSING ● STREAM PROCESSING FRAMEWORK
  • 11. DIY STREAM PROCESSING Major Challenges: ● FAULT TOLERANCE ● PARTITIONING AND SCALABILITY ● TIME ● STATE ● REPROCESSING
  • 12. STREAM PROCESSING FRAMEWORK Many already available stream processing framework are: SPARK STORM SAMZA FLINK ETC...
  • 13. KAFKA STREAMS : ANOTHER WAY OF STREAM PROCESSING
  • 14. Let’s starts with Kafka Stream but wait.. What is KAFKA?
  • 15. Hello! Apache Kafka ● Apache Kafka is an Open Source project under Apache Licence 2.0 ● Apache Kafka was originally developed by LinkedIn. ● On 23 October 2012 Apache Kafka graduated from incubator to top level projects. ● Components of Apache Kafka ○ Producer ○ Consumer ○ Broker ○ Topic ○ Data ○ Parallelism
  • 18. What is Kafka Streams ● It is Streams API of Apache Kafka, available through a Java library. ● Kafka Streams is built on top of functionality provided by Kafka’s. ● It is , by deliberate design, tightly integrated with Apache Kafka. ● It can be used to build highly scalable, elastic, fault-tolerant, distributed applications and microservices. ● Kafka Streams API allows you to create real-time applications. ● It is the easiest yet the most powerful technology to process data stored in Kafka.
  • 20. If we look closer ● A key motivation of the Kafka Streams API is to bring stream processing out of the Big Data niche into the world of mainstream application development. ● Using the Kafka Streams API you can implement standard Java applications to solve your stream processing needs. ● Your applications are fully elastic: you can run one or more instances of your application. ● This lightweight and integrative approach of the Kafka Streams API – “Build applications, not infrastructure!” . ● Deployment-wise you are free to chose from any technology that can deploy Java applications
  • 21. Capabilities of Kafka Stream ● Powerful ○ Makes your applications highly scalable, elastic, distributed, fault- tolerant. ○ Stateful and stateless processing ○ Event-time processing with windowing, joins, aggregations ● Lightweight ○ Low barrier to entry ○ No processing cluster required ○ No external dependencies other than Apache Kafka
  • 22. Capabilities of Kafka Stream ● Real-time ○ Millisecond processing latency ○ Record-at-a-time processing (no micro-batching) ○ Seamlessly handles late-arriving and out-of-order data ○ High throughput ● Fully integrated ○ 100% compatible with Apache Kafka 0.10.2 and 0.10.1 ○ Easy to integrate into existing applications and microservices ○ Runs everywhere: on-premises, public clouds, private clouds, containers, etc. ○ Integrates with databases through continous change data capture (CDC) performed by Kafka Connect
  • 23. Key concepts of Kafka Streams ● Stateful Stream Processing ● KStream ● KTable ● Time ● Aggregations ● Joins ● Windowing
  • 24. Key concepts of Kafka Streams ● Stateful Stream Processing – Some stream processing applications don’t require state – they are stateless. – In practice, however, most applications require state – they are stateful. – The state must be managed in a fault-tolerant manner. – Application is stateful whenever, for example, it needs to join, aggregate, or window its input data.
  • 25. Key concepts of Kafka Streams ● Kstream – A KStream is an abstraction of a record stream. – Each data record represents a self-contained datum in the unbounded data set. – Using the table analogy, data records in a record stream are always interpreted as an “INSERT” . – Let’s imagine the following two data records are being sent to the stream: ("alice", 1) --> ("alice", 3)
  • 26. Key concepts of Kafka Streams ● Ktable – A KStream is an abstraction of a changelog stream. – Each data record represents an update. – Using the table analogy, data records in a record stream are always interpreted as an “UPDATE” . – Let’s imagine the following two data records are being sent to the stream: ("alice", 1) --> ("alice", 3)
  • 27. Key concepts of Kafka Streams ● Time – A critical aspect in stream processing is the the notion of time. – Kafka Streams supports the following notions of time: ● Event Time ● Processing Time ● Ingestion Time – Kafka Streams assigns a timestamp to every data record via so-called timestamp extractors.
  • 28. Key concepts of Kafka Streams ● Aggregations – An aggregation operation takes one input stream or table, and yields a new table. – It is done by combining multiple input records into a single output record. – In the Kafka Streams DSL, an input stream of an aggregation operation can be a KStream or a KTable, but the output stream will always be a KTable.
  • 29. Key concepts of Kafka Streams ● Joins – A join operation merges two input streams and/or tables based on the keys of their data records, and yields a new stream/table.
  • 30. Key concepts of Kafka Streams ● Windowing – Windowing lets you control how to group records that have the same key for stateful operations such as aggregations or joins into so- called windows. – Windows are tracked per record key. – When working with windows, you can specify a retention period for the window. – This retention period controls how long Kafka Streams will wait for out-of-order or late-arriving data records for a given window. – If a record arrives after the retention period of a window has passed, the record is discarded and will not be processed in that window.
  • 33. Stream Partitions and Tasks ● Each stream partition is a totally ordered sequence of data records and maps to a Kafka topic partition. ● A data record in the stream maps to a Kafka message from that topic. ● The keys of data records determine the partitioning of data in both Kafka and Kafka Streams, i.e., how data is routed to specific partitions within topics.
  • 34. Threading Model ● Kafka Streams allows the user to configure the number of threads that the library can use to parallelize processing within an application instance. ● Each thread can execute one or more stream tasks with their processor topologies independently.
  • 35. State ● Kafka Streams provides so-called state stores. ● State can be used by stream processing applications to store and query data, which is an important capability when implementing stateful operations.
  • 36. Backpressure ● Kafka Streams does not use a backpressure mechanism because it does not need one. ● It uses depth-first processing strategy. ● Each record consumed from Kafka will go through the whole processor (sub-)topology for processing and for (possibly) being written back to Kafka before the next record will be processed. ● No records are being buffered in-memory between two connected stream processors. ● Kafka Streams leverages Kafka’s consumer client behind the scenes.
  • 38. HOW TO GET DATA IN AND OUT OF KAFKA?
  • 40. Kafka connect ● So-called Sources import data into Kafka, and Sinks export data from Kafka. ● An implementation of a Source or Sink is a Connector. And users deploy connectors to enable data flows on Kafka ● All Kafka Connect sources and sinks map to partitioned streams of records. ● This is a generalization of Kafka’s concept of topic partitions: a stream refers to the complete set of records that are split into independent infinite sequences of records
  • 41. CONFIGURING CONNECTORS ● Connector configurations are key-value mappings. ● For standalone mode these are defined in a properties file and passed to the Connect process on the command line. ● In distributed mode, they will be included in the JSON payload sent over the REST API for the request that creates the connector.
  • 42. CONFIGURING CONNECTORS Few settings common that are common to all connectors: ● name - Unique name for the connector. Attempting to register again with the same name will fail. ● connector.class - The Java class for the connector ● tasks.max - The maximum number of tasks that should be created for this connector. The connector may create fewer tasks if it cannot achieve this level of parallelism.

Editor's Notes

  1. continuously, concurrently, and in a record-by-record fashion. But as a continuous infinite stream of data integrated from both live and historical sources.
  2. A big data architecture contains several parts. Often, masses of structured and semi-structured historical data are stored in Hadoop (Volume + Variety). On the other side, stream processing is used for fast data requirements (Velocity + Variety). Both complement each other very well. This meetup focuses on real-time and stream processing.
  3. IMAGE SOURCE https://image.slidesharecdn.com/demystifyingstreamprocessingwithapachekafka-161118053223/95/demystifying-stream-processing-with-apache-kafka-4-638.jpg?cb=1479447621 Synchronous and tightly coupled Scaling is possible by adding more instances to this service Latency sensitive and due to tight coupling its sensitive to failures.
  4. you send all your inputs in and wait for your system to crunch all that data before it send all the output back.
  5. in between request/response and batch systems. here you send some inputs in and you get some outputs back. this definition of SOME is left to the program. the o/p is available at variable times too. the BIG shift is that, stream processing knows that the data is unbounded and it shall never be complete. BENEFIT: It gives complete control to the program over the tradeoffs involved. (latency, correctness and cost )
  6. DIY → you take your kafka libraries and you decide to decide to do everything yourself. If you have decided to do this then you should be aware of these hard problems.
  7. producers publish data to Kafka brokers, and consumers read published data from Kafka brokers. Producers and consumers are totally decoupled, and both run outside the Kafka brokers in the perimeter of a Kafka cluster. A Kafka cluster consists of one or more brokers.
  8. Kafka topics are divided into a number of partitions. Partitions allow you to parallelize a topic by splitting the data in a particular topic across multiple brokers — each partition can be placed on a separate machine to allow for multiple consumers to read from a topic in parallel. Consumers can also be parallelized so that multiple consumers can read from multiple partitions in a topic allowing for very high message processing throughput.
  9. Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other data systems. It makes it simple to quickly define connectors that move large data sets into and out of Kafka. Kafka Connect’s scope is narrow: it focuses only on copying streaming data to and from Kafka and does not handle other tasks, such as stream processing,
  10. Standalone: bin/connect-standalone worker.properties connector1.properties [connector2.properties connector3.properties ...] Standalone mode is the simplest mode, where a single process is responsible for executing all connectors and tasks. Since it is a single process, it requires minimal configuration. Distributed mode provides scalability and automatic fault tolerance for Kafka Connect. In distributed mode, you start many worker processes using the same group.id and they automatically coordinate to schedule execution of connectors and tasks across all available workers. curl -X POST -H "Content-Type: application/json" --data '{"name": "local-console-source", "config": {"connector.class":"org.apache.kafka.connect.file.FileStreamSourceConnector", "tasks.max":"1", "topic":"connect-test" }}' http://localhost:8083/connectors # Or, to use a file containing the JSON-formatted configuration # curl -X POST -H "Content-Type: application/json" --data @config.json http://localhost:8083/connectors
  11. Sink connectors also have one additional option to control their input, topics - A list of topics to use as input for this connector