William Hill is one of the UK’s largest, most well-established gaming companies with a global presence across 9 countries with over 16,000 employees. In recent years the gaming industry and in particular sports betting, has been revolutionised by technology. Customers now demand a wide range of events and markets to bet on both pre-game and in-play 24/7. This has driven out a business need to process more data, provide more updates and offer more markets and prices in real time.
At William Hill, we have invested in a completely new trading platform using Apache Kafka. We process vast quantities of data from a variety of feeds, this data is fed through a variety of odds compilation models, before being piped out to UI apps for use by our trading teams to provide events, markets and pricing data out to various end points across the whole of William Hill. We deal with thousands of sporting events, each with sometimes hundreds of betting markets, each market receiving hundreds of updates. This scales up to vast numbers of messages flowing through our system. We have to process, transform and route that data in real time. Using Apache Kafka, we have built a high throughput, low latency pipeline, based on Cloud hosted Microservices. When we started, we were on a steep learning curve with Kafka, Microservices and associated technologies. This led to fast learnings and fast failings.
In this session, we will tell the story of what we built, what went well, what didn’t go so well and what we learnt. This is a story of how a team of developers learnt (and are still learning) how to use Kafka. We hope that you will be able to take away lessons and learnings of how to build a data processing pipeline with Apache Kafka.
Building CI/CD Pipelines with Jenkins and KubernetesJanakiram MSV
Learn how to configure CI/CD pipelines with Jenkins and Kubernetes. We will show you to how to automate deployments from source code to production clusters.
Common issues with Apache Kafka® Producerconfluent
Badai Aqrandista, Confluent, Senior Technical Support Engineer
This session will be about a common issue in the Kafka Producer: producer batch expiry. We will be discussing the Kafka Producer internals, its common causes, such as a slow network or small batching, and how to overcome them. We will also be sharing some examples along the way!
https://www.meetup.com/apache-kafka-sydney/events/279651982/
Watch this talk here: https://www.confluent.io/online-talks/from-zero-to-hero-with-kafka-connect-on-demand
Integrating Apache Kafka® with other systems in a reliable and scalable way is often a key part of a streaming platform. Fortunately, Apache Kafka includes the Connect API that enables streaming integration both in and out of Kafka. Like any technology, understanding its architecture and deployment patterns is key to successful use, as is knowing where to go looking when things aren't working.
This talk will discuss the key design concepts within Apache Kafka Connect and the pros and cons of standalone vs distributed deployment modes. We'll do a live demo of building pipelines with Apache Kafka Connect for streaming data in from databases, and out to targets including Elasticsearch. With some gremlins along the way, we'll go hands-on in methodically diagnosing and resolving common issues encountered with Apache Kafka Connect. The talk will finish off by discussing more advanced topics including Single Message Transforms, and deployment of Apache Kafka Connect in containers.
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
Flink Forward San Francisco 2022.
In normal situations, the default Kafka consumer and producer configuration options work well. But we all know life is not all roses and rainbows and in this session we’ll explore a few knobs that can save the day in atypical scenarios. First, we'll take a detailed look at the parameters available when reading from Kafka. We’ll inspect the params helping us to spot quickly an application lock or crash, the ones that can significantly improve the performance and the ones to touch with gloves since they could cause more harm than benefit. Moreover we’ll explore the partitioning options and discuss when diverging from the default strategy is needed. Next, we’ll discuss the Kafka Sink. After browsing the available options we'll then dive deep into understanding how to approach use cases like sinking enormous records, managing spikes, and handling small but frequent updates.. If you want to understand how to make your application survive when the sky is dark, this session is for you!
by
Olena Babenko
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
Apache Flink is a distributed stream processing framework that allows users to process and analyze data in real-time. At LinkedIn, we developed a fully managed stream processing platform on Flink running on K8s to power hundreds of stream processing pipelines in production. This platform is the backbone for other infra systems like Search, Espresso (internal document store) and feature management etc. We provide a rich authoring and testing environment which allows users to create, test, and deploy their streaming jobs in a self-serve fashion within minutes. Users can focus on their business logic, leaving the Flink platform to take care of management aspects such as split deployment, resource provisioning, auto-scaling, job monitoring, alerting, failure recovery and much more. In this talk, we will introduce the overall platform architecture, highlight the unique value propositions that it brings to stream processing at LinkedIn and share the experiences and lessons we have learned.
Watch this talk here: https://www.confluent.io/online-talks/apache-kafka-architecture-and-fundamentals-explained-on-demand
This session explains Apache Kafka’s internal design and architecture. Companies like LinkedIn are now sending more than 1 trillion messages per day to Apache Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
This talk provides a comprehensive overview of Kafka architecture and internal functions, including:
-Topics, partitions and segments
-The commit log and streams
-Brokers and broker replication
-Producer basics
-Consumers, consumer groups and offsets
This session is part 2 of 4 in our Fundamentals for Apache Kafka series.
1) Apache Kafka is a distributed streaming platform that can be used for publish-subscribe messaging and storing and processing streams of data. However, there are many potential anti-patterns to be aware of when using Kafka.
2) Some common anti-patterns include not properly configuring data durability, ignoring error handling and exceptions, failing to use Kafka's built-in retries and idempotence features, and not embracing Kafka's at least once processing semantics.
3) It is also important to properly configure Kafka for production use by tuning OS settings, reading documentation on best practices, implementing monitoring, and addressing topics and partitioning design.
The document discusses Apache Flink, an open source stream processing framework. It provides high throughput and low latency processing of both streaming and batch data. Flink allows for explicit handling of event time, stateful stream processing with exactly-once semantics, and high performance. It also supports features like windowing, sessionization, and complex event processing that are useful for building streaming applications.
Apache Kafka becoming the message bus to transfer huge volumes of data from various sources into Hadoop.
It's also enabling many real-time system frameworks and use cases.
Managing and building clients around Apache Kafka can be challenging. In this talk, we will go through the best practices in deploying Apache Kafka
in production. How to Secure a Kafka Cluster, How to pick topic-partitions and upgrading to newer versions. Migrating to new Kafka Producer and Consumer API.
Also talk about the best practices involved in running a producer/consumer.
In Kafka 0.9 release, we’ve added SSL wire encryption, SASL/Kerberos for user authentication, and pluggable authorization. Now Kafka allows authentication of users, access control on who can read and write to a Kafka topic. Apache Ranger also uses pluggable authorization mechanism to centralize security for Kafka and other Hadoop ecosystem projects.
We will showcase open sourced Kafka REST API and an Admin UI that will help users in creating topics, re-assign partitions, Issuing
Kafka ACLs and monitoring Consumer offsets.
Building a scalable microservice architecture with envoy, kubernetes and istioSAMIR BEHARA
Talk from O'Reilly Software Architecture Conference San Jose 2019
Microservices and containers have taken the software industry by storm. Transitioning from a monolith to microservices enables you to deploy your application more frequently, independently, and reliably. However, microservice architecture has its own challenges, and it has to deal with the same problems encountered while designing distributed systems.
Enter service mesh technology to the rescue. A service mesh reduces the complexity associated with microservices and provides functionality like load balancing, service discovery, traffic management, circuit breaking, telemetry, fault injection, and more. Istio is one of the best implementations of a service mesh at this point, while Kubernetes provides a platform for running microservices and automating deployment of containerized applications.
Join Samir Behara to go beyond the buzz and understand microservices and service mesh technologies.
This presentation includes information on Kubernetes Architecture, Container Orchestration, Internal Routing, External Routing, Configuration Management, Credentials Management, Persistent Volumes, Rolling Out Updates, Autoscaling, Package Management, and a Hello World example using Helm.
Spring Boot+Kafka: the New Enterprise PlatformVMware Tanzu
This document discusses how Spring Boot and Kafka can form the basis of a new enterprise application platform focused on continuous delivery, event-driven architectures, and streaming data. It provides examples of companies that have successfully adopted this approach, such as Netflix transitioning to Spring Boot and a banking brand building a new core banking system using Spring Streams and Kafka. The document advocates an "event-first" and microservices-oriented mindset enabled by a streaming data platform and suggests that Spring Boot, Kafka, and related technologies provide a turnkey solution for implementing this new application development approach at large enterprises.
Flink Forward San Francisco 2022.
This talk will take you on the long journey of Apache Flink into the cloud-native era. It started all the way from where Hadoop and YARN were the standard way of deploying and operating data applications.
We're going to deep dive into the cloud-native set of principles and how they map to the Apache Flink internals and recent improvements. We'll cover fast checkpointing, fault tolerance, resource elasticity, minimal infrastructure dependencies, industry-standard tooling, ease of deployment and declarative APIs.
After this talk you'll get a broader understanding of the operational requirements for a modern streaming application and where the current limits are.
by
David Moravek
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKai Wähner
This document provides an agenda for a deep dive on KSQL, the streaming SQL engine for Apache Kafka. It begins with an overview of the Apache Kafka ecosystem and how Kafka Streams serves as the foundation for KSQL. It then discusses the motivations for using KSQL and covers KSQL concepts like streams, tables, and windowing. The agenda also includes two live demos - an introduction to KSQL and a clickstream analysis example. It will discuss building user-defined functions with KSQL and machine learning. Finally, it covers getting started with KSQL.
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...Athens Big Data
Title: MLOps Workshop: The Full ML Lifecycle - How to Use ML in Production
Speakers: Spyros Cavadias (https://www.linkedin.com/in/spyros-cavadias/), Konstantinos Pittas (https://www.linkedin.com/in/konstantinos-pittas-83310270/), Thanos Gkinakos (https://www.linkedin.com/in/thanos-gkinakos-03582a128/)
Date: Saturday, December 17, 2022
Event: https://www.meetup.com/athens-big-data/events/289927468/
ksqlDB: A Stream-Relational Database Systemconfluent
Speaker: Matthias J. Sax, Software Engineer, Confluent
ksqlDB is a distributed event streaming database system that allows users to express SQL queries over relational tables and event streams. The project was released by Confluent in 2017 and is hosted on Github and developed with an open-source spirit. ksqlDB is built on top of Apache Kafka®, a distributed event streaming platform. In this talk, we discuss ksqlDB’s architecture that is influenced by Apache Kafka and its stream processing library, Kafka Streams. We explain how ksqlDB executes continuous queries while achieving fault tolerance and high vailability. Furthermore, we explore ksqlDB’s streaming SQL dialect and the different types of supported queries.
Matthias J. Sax is a software engineer at Confluent working on ksqlDB. He mainly contributes to Kafka Streams, Apache Kafka's stream processing library, which serves as ksqlDB's execution engine. Furthermore, he helps evolve ksqlDB's "streaming SQL" language. In the past, Matthias also contributed to Apache Flink and Apache Storm and he is an Apache committer and PMC member. Matthias holds a Ph.D. from Humboldt University of Berlin, where he studied distributed data stream processing systems.
https://db.cs.cmu.edu/events/quarantine-db-talk-2020-confluent-ksqldb-a-stream-relational-database-system/
Extending Flink SQL for stream processing use casesFlink Forward
1. For streaming data, Flink SQL uses STREAMs for append-only queries and CHANGELOGs for upsert queries instead of tables.
2. Stateless queries on streaming data, such as projections and filters, result in new STREAMs or CHANGELOGs.
3. Stateful queries, such as aggregations, produce STREAMs or CHANGELOGs depending on whether they are windowed or not. Join queries between streaming sources also result in STREAM outputs.
Introducing the Apache Flink Kubernetes OperatorFlink Forward
Flink Forward San Francisco 2022.
The Apache Flink Kubernetes Operator provides a consistent approach to manage Flink applications automatically, without any human interaction, by extending the Kubernetes API. Given the increasing adoption of Kubernetes based Flink deployments the community has been working on a Kubernetes native solution as part of Flink that can benefit from the rich experience of community members and ultimately make Flink easier to adopt. In this talk we give a technical introduction to the Flink Kubernetes Operator and demonstrate the core features and use-cases through in-depth examples."
by
Thomas Weise
Should you use traditional REST APIs to bind services together? Or is it better to use a richer, more loosely-coupled protocol? This talk will dig into how we piece services together in event driven systems, how we use a distributed log (event hub) to create a central, persistent history of events and what benefits we achieve from doing so. Apache Kafka is a perfect match for building such an asynchronous, loosely-coupled event-driven backbone. Events trigger processing logic, which can be implemented in a more traditional as well as in a stream processing fashion. The talk will show the difference between a request-driven and event-driven communication and show when to use which. It highlights how the modern stream processing systems can be used to
hold state both internally as well as in a database and how this state can be used to further increase independence of other services, the primary goal of a Microservices architecture.
In the last few years, Apache Kafka has been used extensively in enterprises for real-time data collecting, delivering, and processing. In this presentation, Jun Rao, Co-founder, Confluent, gives a deep dive on some of the key internals that help make Kafka popular.
- Companies like LinkedIn are now sending more than 1 trillion messages per day to Kafka. Learn about the underlying design in Kafka that leads to such high throughput.
- Many companies (e.g., financial institutions) are now storing mission critical data in Kafka. Learn how Kafka supports high availability and durability through its built-in replication mechanism.
- One common use case of Kafka is for propagating updatable database records. Learn how a unique feature called compaction in Apache Kafka is designed to solve this kind of problem more naturally.
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
Chris Curtin gave a presentation on Apache Kafka at the Atlanta Java Users Group. He discussed his background in technology and current role at Silverpop. He then provided an overview of Apache Kafka, describing its core functionality as a distributed publish-subscribe messaging system. Finally, he demonstrated how producers and consumers interact with Kafka and highlighted some use cases and performance figures from LinkedIn's deployment of Kafka.
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
Fundamentals and Architecture of Apache Kafka.
This presentation explains Apache Kafka's architecture and internal design giving an overview of Kafka internal functions, including:
Brokers, Replication, Partitions, Producers, Consumers, Commit log, comparison over traditional message queues.
Kafka's basic terminologies, its architecture, its protocol and how it works.
Kafka at scale, its caveats, guarantees and use cases offered by it.
How we use it @ZaprMediaLabs.
Unleashing Real-time Power with Kafka.pptxKnoldus Inc.
Unlock the potential of real-time data streaming with Kafka in this session. Learn the fundamentals, architecture, and seamless integration with Scala, empowering you to elevate your data processing capabilities. Perfect for developers at all levels, this hands-on experience will equip you to harness the power of real-time data streams effectively.
This document discusses building event-driven, fault-tolerant microservices. It begins by discussing lessons learned from SOA architecture and defining microservices. It emphasizes that microservices need to be reactive and message-driven to achieve loose coupling and fault tolerance. The document then outlines challenges in implementing microservices at scale before proposing a design using Spring Boot, Kafka, Docker, and Elastic Stack. It provides an in-depth look at these components and how they address scalability, isolation, fault tolerance and monitoring of microservices.
This document discusses LinkedIn's use of Kafka, Hadoop, Storm, and Couchbase in their big data pipeline. It provides an overview of each technology and how LinkedIn uses them together. Specifically, it describes how LinkedIn uses Kafka to stream data to Hadoop for analytics and report generation. It also discusses how LinkedIn uses Hadoop to pre-build and warm Couchbase buckets for improved performance. The presentation includes a use case of streaming member profile and activity data through Kafka to both Hadoop and Couchbase clusters.
At Hootsuite, we've been transitioning from a single monolithic PHP application to a set of scalable Scala-based microservices. To avoid excessive coupling between services, we've implemented an event system using Apache Kafka that allows events to be reliably produced + consumed asynchronously from services as well as data stores.
In this presentation, I talk about:
- Why we chose Kafka
- How we set up our Kafka clusters to be scalable, highly available, and multi-data-center aware.
- How we produce + consume events
- How we ensure that events can be understood by all parts of our system (Some that are implemented in other programming languages like PHP and Python) and how we handle evolving event payload data.
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Erik Onnen
The document discusses Urban Airship's use of Apache Kafka for processing continuous data streams. It describes how Urban Airship uses Kafka for analytics, operational data, and presence data. Producers write device data to Kafka topics, and consumers create indexes from the data in databases like HBase and write to operational data warehouses. The document also covers Kafka concepts, best use cases, limitations, and examples of data structures for storing device metadata in Kafka streams.
This document provides an introduction to Apache Kafka, an open-source distributed event streaming platform. It discusses Kafka's history as a project originally developed by LinkedIn, its use cases like messaging, activity tracking and stream processing. It describes key Kafka concepts like topics, partitions, offsets, replicas, brokers and producers/consumers. It also gives examples of how companies like Netflix, Uber and LinkedIn use Kafka in their applications and provides a comparison to Apache Spark.
The document discusses lessons learned from building a real-time data processing platform using Spark and microservices. Key aspects include:
- A microservices-inspired architecture was used with Spark Streaming jobs processing data in parallel and communicating via Kafka.
- This modular approach allowed for independent development and deployment of new features without disrupting existing jobs.
- While Spark provided batch and streaming capabilities, managing resources across jobs and achieving low latency proved challenging.
- Alternative technologies like Kafka Streams and Confluent's Schema Registry were identified to improve resilience, schemas, and processing latency.
- Overall the platform demonstrated strengths in modularity, A/B testing, and empowering data scientists, but faced challenges around
Building Effective Near-Real-Time Analytics with Spark Streaming and KuduJeremy Beard
This document discusses building near-real-time analytics pipelines using Apache Spark Streaming and Apache Kudu on the Cloudera platform. It defines near-real-time analytics, describes the relevant components of the Cloudera stack (Kafka, Spark, Kudu, Impala), and how they can work together. The document then outlines the typical stages involved in implementing a Spark Streaming to Kudu pipeline, including sourcing from a queue, translating data, deriving storage records, planning mutations, and storing the data. It provides performance considerations and introduces Envelope, a Spark Streaming application on Cloudera Labs that implements these stages through configurable pipelines.
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Exampleconfluent
This document introduces Kafka Streams and provides an example of using it to process streaming data from Apache Kafka. It summarizes some key limitations of using Apache Spark for streaming use cases with Kafka before demonstrating how to build a simple text processing pipeline with Kafka Streams. The document also discusses parallelism, state stores, aggregations, joins and deployment considerations when using Kafka Streams. It provides an example of how Kafka Streams was used to aggregate metrics from multiple instances of an application into a single stream.
Scylla Summit 2016: Compose on Containing the DatabaseScyllaDB
This document discusses how Compose applies containerization best practices to provide database services. It outlines the "Twelve Factors of Stateful Apps" that guide Compose's architecture. These include running databases and data in separate containers, using environment variables for configuration, scaling containers vertically before adding nodes, and collecting logs and metrics within the deployment. By applying these factors, Compose can reliably deploy a range of database technologies like MongoDB, PostgreSQL, and now ScyllaDB across its platform.
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsData Con LA
Adoption of open source software (OSS) at the enterprise level has flourished, as more businesses discover the considerable advantages that open source solutions hold over their proprietary counterparts, and as the enterprise mentality around open source continues to shift. We will discuss how to identify good application candidates for Apache Cassandra and Kafka as well as best practices and common pitfalls.
This presentation will also cover:
The origins of Apache Cassandra and Kafka and how these technologies have come to shape how next-gen applications are built.
Production use cases of Cassandra and Kafka: Real-time payments and buying a house (Lendi and Worldpay)
Core concepts that make the magic; Explaining the technical attributes that make your project a good fit for these technologies and the architectural patterns that make the best use of it’s capability.
Speaker: Adam Zegelin, SVP Engineering and Co-Founder, Instaclustr
As Instaclustr's founding software engineer, Adam provides the foundation knowledge of Instaclustr's capability and engineering environment. Adam is also focused on providing Instaclustr's contribution to the broader open source community on which our products and the services rely, including Apache Cassandra, Apache Spark, and other core technologies such as CoreOS and Docker. Prior to founding Instaclustr, Adam worked on large-scale big data projects with Australian Government agencies.
This document discusses new age distributed messaging using Apache Kafka. It begins with an introduction to Kafka concepts like topics, partitions, producers and consumers. It then explains how Kafka uses commit log architecture and an append-only log structure to provide high throughput performance. The document also covers how Zookeeper is used to coordinate Kafka brokers and keep metadata. It evaluates Kafka's performance based on LinkedIn benchmarks, finding that its lack of acknowledgements, batching and storage format allow for very fast publishing and consumption of messages. In conclusion, the document suggests Kafka could be introduced in some parts of Responsys' architecture to handle big data workloads.
Apache Kafka is a fast, scalable, and distributed messaging system. It is designed for high throughput systems and can serve as a replacement for traditional message brokers. Kafka uses a publish-subscribe messaging model where messages are published to topics that multiple consumers can subscribe to. It provides benefits such as reliability, scalability, durability, and high performance.
Apache Kafka is a fast, scalable, and distributed messaging system. It is designed for high throughput systems and can serve as a replacement for traditional message brokers. Kafka uses a publish-subscribe messaging model where messages are published to topics that multiple consumers can subscribe to. It provides benefits such as reliability, scalability, durability, and high performance.
Similar to Building High-Throughput, Low-Latency Pipelines in Kafka (20)
Unlocking value with event-driven architecture by Confluentconfluent
Sfrutta il potere dello streaming di dati in tempo reale e dei microservizi basati su eventi per il futuro di Sky con Confluent e Kafka®.
In questo tech talk esploreremo le potenzialità di Confluent e Apache Kafka® per rivoluzionare l'architettura aziendale e sbloccare nuove opportunità di business. Ne approfondiremo i concetti chiave, guidandoti nella creazione di applicazioni scalabili, resilienti e fruibili in tempo reale per lo streaming di dati.
Scoprirai come costruire microservizi basati su eventi con Confluent, sfruttando i vantaggi di un'architettura moderna e reattiva.
Il talk presenterà inoltre casi d'uso reali di Confluent e Kafka®, dimostrando come queste tecnologie possano ottimizzare i processi aziendali e generare valore concreto.
Il Data Streaming per un’AI real-time di nuova generazioneconfluent
Per costruire applicazioni di AI affidabili, sicure e governate occorre una base dati in tempo reale altrettanto solida. Ancor più quando ci troviamo a gestire ingenti flussi di dati in continuo movimento.
Come arrivarci? Affidati a una vera piattaforma di data streaming che ti permetta di scalare e creare rapidamente applicazioni di AI in tempo reale partendo da dati affidabili.
Scopri di più! Non perdere il nostro prossimo webinar durante il quale avremo l’occasione di:
• Esplorare il paradigma della GenAI e di come questa nuova tecnnologia sta rimodellando il panorama aziendale, rispondendo alla necessità di offrire un contesto e soluzioni in tempo reale che soddisfino le esigenze della tua azienda.
• Approfondire le incertezze del panorama dell'AI in evoluzione e l'importanza cruciale del data streaming e dell'elaborazione dati.
• Vedere in dettaglio l'architettura in continua evoluzione e il ruolo chiave di Kafka e Confluent nelle applicazioni di AI.
• Analizzare i vantaggi di una piattaforma di streaming dei dati come Confluent nel collegare l'eredità legacy e la GenAI, facilitando lo sviluppo e l’utilizzo di AI predittive e generative.
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...confluent
As businesses strive to remain at the cutting edge of innovation, the demand for scalable and up-to-date conversational AI solutions has become paramount. Generative AI (GenAI) chatbots that seamlessly integrate into our daily lives and adapt to the ever-evolving nuances of human interaction are crucial. Real-time data plays a pivotal role in ensuring the responsiveness and relevance of these chatbots, empowering them to stay abreast of the latest trends, user preferences, and contextual information.
Break data silos with real-time connectivity using Confluent Cloud Connectorsconfluent
Connectors integrate Apache Kafka® with external data systems, enabling you to move away from a brittle spaghetti architecture to one that is more streamlined, secure, and future-proof. However, if your team still spends multiple dev cycles building and managing connectors using just open source Kafka Connect, it’s time to consider a faster and cost-effective alternative.
Building API data products on top of your real-time data infrastructureconfluent
This talk and live demonstration will examine how Confluent and Gravitee.io integrate to unlock value from streaming data through API products.
You will learn how data owners and API providers can document, secure data products on top of Confluent brokers, including schema validation, topic routing and message filtering.
You will also see how data and API consumers can discover and subscribe to products in a developer portal, as well as how they can integrate with Confluent topics through protocols like REST, Websockets, Server-sent Events and Webhooks.
Whether you want to monetize your real-time data, enable new integrations with partners, or provide self-service access to topics through various protocols, this webinar is for you!
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
In our exclusive webinar, you'll learn why event-driven architecture is the key to unlocking cost efficiency, operational effectiveness, and profitability. Gain insights on how this approach differs from API-driven methods and why it's essential for your organization's success.
Santander Stream Processing with Apache Flinkconfluent
Flink is becoming the de facto standard for stream processing due to its scalability, performance, fault tolerance, and language flexibility. It supports stream processing, batch processing, and analytics through one unified system. Developers choose Flink for its robust feature set and ability to handle stream processing workloads at large scales efficiently.
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
In today's data-driven world, the Internet of Things (IoT) is revolutionizing industries and unlocking new possibilities. Join Data Reply, Confluent, and Imply as we unveil a comprehensive solution for IoT that harnesses the power of real-time insights.
Workshop híbrido: Stream Processing con Flinkconfluent
El Stream processing es un requisito previo de la pila de data streaming, que impulsa aplicaciones y pipelines en tiempo real.
Permite una mayor portabilidad de datos, una utilización optimizada de recursos y una mejor experiencia del cliente al procesar flujos de datos en tiempo real.
En nuestro taller práctico híbrido, aprenderás cómo filtrar, unir y enriquecer fácilmente datos en tiempo real dentro de Confluent Cloud utilizando nuestro servicio Flink sin servidor.
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
Our talk will explore the transformative impact of integrating Confluent, HiveMQ, and SparkPlug in Industry 4.0, emphasizing the creation of a Unified Namespace.
In addition to the creation of a Unified Namespace, our webinar will also delve into Stream Governance and Scaling, highlighting how these aspects are crucial for managing complex data flows and ensuring robust, scalable IIoT-Platforms.
You will learn how to ensure data accuracy and reliability, expand your data processing capabilities, and optimize your data management processes.
Don't miss out on this opportunity to learn from industry experts and take your business to the next level.
La arquitectura impulsada por eventos (EDA) será el corazón del ecosistema de MAPFRE. Para seguir siendo competitivas, las empresas de hoy dependen cada vez más del análisis de datos en tiempo real, lo que les permite obtener información y tiempos de respuesta más rápidos. Los negocios con datos en tiempo real consisten en tomar conciencia de la situación, detectar y responder a lo que está sucediendo en el mundo ahora.
Eventos y Microservicios - Santander TechTalkconfluent
Durante esta sesión examinaremos cómo el mundo de los eventos y los microservicios se complementan y mejoran explorando cómo los patrones basados en eventos nos permiten descomponer monolitos de manera escalable, resiliente y desacoplada.
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
This document discusses networking options and best practices for Confluent Cloud. It provides an overview of public endpoints, private link, and peering options. It then discusses best practices for private networking architectures on Azure using hub-and-spoke and private link designs. Finally, it addresses networking considerations and challenges for Kafka Connect managed connectors, as well as planned enhancements for DNS peering and outbound private link support.
Purpose of the session is to have a dive into Apache, Kafka, Data Streaming and Kafka in the cloud
- Dive into Apache Kafka
- Data Streaming
- Kafka in the cloud
Build real-time streaming data pipelines to AWS with Confluentconfluent
Traditional data pipelines often face scalability issues and challenges related to cost, their monolithic design, and reliance on batch data processing. They also typically operate under the premise that all data needs to be stored in a single centralized data source before it's put to practical use. Confluent Cloud on Amazon Web Services (AWS) provides a fully managed cloud-native platform that helps you simplify the way you build real-time data flows using streaming data pipelines and Apache Kafka.
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
No matter whether you are migrating your Kafka cluster to Confluent Cloud, running a cloud-hybrid environment or are in a different situation where data protection and encryption of sensitive information is required, Confluent Service Mesh allows you to transparently encrypt your data without the need to make code changes to you existing applications.
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
Microservices have become a dominant architectural paradigm for building systems in the enterprise, but they are not without their tradeoffs. Learn how to build event-driven microservices with Apache Kafka
Confluent & GSI Webinars series - Session 3confluent
An in depth look at how Confluent is being used in the financial services industry. Gain an understanding of how organisations are utilising data in motion to solve common problems and gain benefits from their real time data capabilities.
It will look more deeply into some specific use cases and show how Confluent technology is used to manage costs and mitigate risks.
This session is aimed at Solutions Architects, Sales Engineers and Pre Sales, and also the more technically minded business aligned people. Whilst this is not a deeply technical session, a level of knowledge around Kafka would be helpful.
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...Fwdays
.NET 8 brought a lot of improvements for developers and maturity to the Azure serverless container ecosystem. So, this talk will cover these changes and explain how you can apply them to your projects. Another reason for this talk is the re-invention of Serverless from a DevOps perspective as a Platform Engineering trend with Backstage and the recent Radius project from Microsoft. So now is the perfect time to look at developer productivity tooling and serverless apps from Microsoft's perspective.
The Challenge of Interpretability in Generative AI Models.pdfSara Kroft
Navigating the intricacies of generative AI models reveals a pressing challenge: interpretability. Our blog delves into the complexities of understanding how these advanced models make decisions, shedding light on the mechanisms behind their outputs. Explore the latest research, practical implications, and ethical considerations, as we unravel the opaque processes that drive generative AI. Join us in this insightful journey to demystify the black box of artificial intelligence.
Dive into the complexities of generative AI with our blog on interpretability. Find out why making AI models understandable is key to trust and ethical use and discover current efforts to tackle this big challenge.
DefCamp_2016_Chemerkin_Yury-publish.pdf - Presentation by Yury Chemerkin at DefCamp 2016 discussing mobile app vulnerabilities, data protection issues, and analysis of security levels across different types of mobile applications.
The History of Embeddings & Multimodal EmbeddingsZilliz
Frank Liu will walk through the history of embeddings and how we got to the cool embedding models used today. He'll end with a demo on how multimodal RAG is used.
Garbage In, Garbage Out: Why poor data curation is killing your AI models (an...Zilliz
Enterprises have traditionally prioritized data quantity, assuming more is better for AI performance. However, a new reality is setting in: high-quality data, not just volume, is the key. This shift exposes a critical gap – many organizations struggle to understand their existing data and lack effective curation strategies and tools. This talk dives into these data challenges and explores the methods of automating data curation.
Redefining Cybersecurity with AI CapabilitiesPriyanka Aash
In this comprehensive overview of Cisco's latest innovations in cybersecurity, the focus is squarely on resilience and adaptation in the face of evolving threats. The discussion covers the imperative of tackling Mal information, the increasing sophistication of insider attacks, and the expanding attack surfaces in a hybrid work environment. Emphasizing a shift towards integrated platforms over fragmented tools, Cisco introduces its Security Cloud, designed to provide end-to-end visibility and robust protection across user interactions, cloud environments, and breaches. AI emerges as a pivotal tool, from enhancing user experiences to predicting and defending against cyber threats. The blog underscores Cisco's commitment to simplifying security stacks while ensuring efficacy and economic feasibility, making a compelling case for their platform approach in safeguarding digital landscapes.
Cracking AI Black Box - Strategies for Customer-centric Enterprise ExcellenceQuentin Reul
The democratization of Generative AI is ushering in a new era of innovation for enterprises. Discover how you can harness this powerful technology to deliver unparalleled customer value and securing a formidable competitive advantage in today's competitive market. In this session, you will learn how to:
- Identify high-impact customer needs with precision
- Harness the power of large language models to address specific customer needs effectively
- Implement AI responsibly to build trust and foster strong customer relationships
Whether you're at the early stages of your AI journey or looking to optimize existing initiatives, this session will provide you with actionable insights and strategies needed to leverage AI as a powerful catalyst for customer-driven enterprise success.
How UiPath Discovery Suite supports identification of Agentic Process Automat...DianaGray10
📚 Understand the basics of the newly persona-based LLM-powered Agentic Process Automation and discover how existing UiPath Discovery Suite products like Communication Mining, Process Mining, and Task Mining can be leveraged to identify APA candidates.
Topics Covered:
💡 Idea Behind APA: Explore the innovative concept of Agentic Process Automation and its significance in modern workflows.
🔄 How APA is Different from RPA: Learn the key differences between Agentic Process Automation and Robotic Process Automation.
🚀 Discover the Advantages of APA: Uncover the unique benefits of implementing APA in your organization.
🔍 Identifying APA Candidates with UiPath Discovery Products: See how UiPath's Communication Mining, Process Mining, and Task Mining tools can help pinpoint potential APA candidates.
🔮 Discussion on Expected Future Impacts: Engage in a discussion on the potential future impacts of APA on various industries and business processes.
Enhance your knowledge on the forefront of automation technology and stay ahead with Agentic Process Automation. 🧠💼✨
Speakers:
Arun Kumar Asokan, Delivery Director (US) @ qBotica and UiPath MVP
Naveen Chatlapalli, Solution Architect @ Ashling Partners and UiPath MVP
Keynote : AI & Future Of Offensive SecurityPriyanka Aash
In the presentation, the focus is on the transformative impact of artificial intelligence (AI) in cybersecurity, particularly in the context of malware generation and adversarial attacks. AI promises to revolutionize the field by enabling scalable solutions to historically challenging problems such as continuous threat simulation, autonomous attack path generation, and the creation of sophisticated attack payloads. The discussions underscore how AI-powered tools like AI-based penetration testing can outpace traditional methods, enhancing security posture by efficiently identifying and mitigating vulnerabilities across complex attack surfaces. The use of AI in red teaming further amplifies these capabilities, allowing organizations to validate security controls effectively against diverse adversarial scenarios. These advancements not only streamline testing processes but also bolster defense strategies, ensuring readiness against evolving cyber threats.
Keynote : Presentation on SASE TechnologyPriyanka Aash
Secure Access Service Edge (SASE) solutions are revolutionizing enterprise networks by integrating SD-WAN with comprehensive security services. Traditionally, enterprises managed multiple point solutions for network and security needs, leading to complexity and resource-intensive operations. SASE, as defined by Gartner, consolidates these functions into a unified cloud-based service, offering SD-WAN capabilities alongside advanced security features like secure web gateways, CASB, and remote browser isolation. This convergence not only simplifies management but also enhances security posture and application performance across global networks and cloud environments. Discover how adopting SASE can streamline operations and fortify your enterprise's digital transformation strategy.
Increase Quality with User Access Policies - July 2024Peter Caitens
⭐️ Increase Quality with User Access Policies ⭐️, presented by Peter Caitens and Adam Best of Salesforce. View the slides from this session to hear all about “User Access Policies” and how they can help you onboard users faster with greater quality.
Self-Healing Test Automation Framework - HealeniumKnoldus Inc.
Revolutionize your test automation with Healenium's self-healing framework. Automate test maintenance, reduce flakes, and increase efficiency. Learn how to build a robust test automation foundation. Discover the power of self-healing tests. Transform your testing experience.
2. Introduction
How a development department in a well-established enterprise company
with no prior knowledge of Apache Kafka® built a real-time data pipeline in
Kafka, learning it as we went along.
This tells the story of what happened, what we learned and what we did
wrong.
2
3. Who are we and what do we do?
• We are William Hill one of the oldest and most well-established
companies in the gaming industry
• We work in the Trading department of William Hill and we “trade” what
happens in a Sports event.
• We deal with managing odds for c200k sports events a year. We publish
odds for the company and result the markets once events have concluded.
• Cater for both traditional pre-match markets and in-play markets
• We have been building applications based on messaging technology for a
long time as it suits our event-based use-cases
3
5. Kafka - MOM
5
• Message Persistence
• messages not removed when read
• Consumer Position Control
• replay data
• Minimal Overhead
• consumers reading from same “durable” topic
7. Kafka Consumer Groups
• Partitions Distributed Evenly Amongst
Consumers in Consumer Group
• Partition can Only be Consumed by One
Consumer in Consumer Group
• Partition has Offset for each Consumer
Group
11. Kafka Development Considerations
• Relatively new, the community is still growing, maturity can be an issue.
• Know your use case - is it suited to Kafka?
• Know your implementation :-
• Native Kafka
• Spring Kafka
• Kafka Streams
• Camel
• Spring Integration
11
12. 2016 – Our journey begins
• Rapidly evolving industry = new requirements & use cases
• Upgrade the tech stack
12
Kafka
Microservices
Docker
Cloud
13. Java Vs Scala
13
Java
• More mature
• More knowledge of it
• More disciplined
Scala
• More functional
• More flexible
• Better suited to data crunching
15. Standardization
Common approach allows many people to work with any part of the platform
• Language – Java over Scala/Erlang
• Messaging – Kafka over ActiveMQ/RabbitMQ
• Libraries – Spring Boot, Kafka Implementation
• Environments - Docker
• Releases - Versioning and Deployment Strategy
• Distributed Logging and Monitoring - Central UI, Format, Correlation
16. Architectural considerations
16
• Architectural steer to avoid using persistent data stores
to keep latency short
• We had to think about where to keep or cache data
• We started to have to think about Kafka as a data store
• This is where we started trying to use Kafka Streams
17. Architectural Options
We looked at a number of ways to solve our problems with data access in
apps, given our architectural steer
• Kafka Streams
• Creating our own abstractions on native Kafka
• Using some of kind of data store
17
18. Kafka Streams
• Use case for historical data to be visible in certain UIs
• UIs would subscribe to a specific event, but topics carry messages for all
events
• We had a need to be able read data as if it was a distributed cache
• Streams solved many of those problems
• Fault tolerance was a issue, we had difficulty recovering from a rebalance
and we had problems in starting up, mainly caused by not being able to
use a persistent data store
• Tech was still in dev at the time, and Kafka 1.0 came a little late for us
18
19. Message Format
• Bad message formats can wreck your system
• Common principles of Kafka messages:
• A message is basically an event
• Messages are of a manageable size
• Messages are simple and easy to process
• Messages are idempotent (i.e. a fact)
• Data should be organised by resources not by specific service needs
• Backward compatibility
19
20. Full state messages
20
• Big & unwieldy
• Resource heavy
• Can affect latency
• Wasteful
• Lots of boilerplate code
• Resilient – doesn’t matter if you drop it
• Stateless
• Don’t need to cache anything
• Can gzip big messages
21. Processing Full State Messages
• Message reading and message processing done asynchronously
• While the latest message is being processed, subsequent messages are
pushed on to a stack
• When the first message is processed, the next one is taken from the top of
the stack, and the rest of the stack is emptied
• Effectively a drop buffer
21
22. Testing…
• We wanted unit, integration and system level testing
• Unit testing is straight forward
• Integration and System testing with large distributed tools is a challenge
22
23. The Integration Testing Elephant
• There is a lot of talk in IT about DevOps and shift-left testing
• There is a lot of talk around Big Data style distributed systems
• Doing early integration testing with Big Data tools is difficult, and there
is a gap in this area
• Giving developers the tools to do local integration testing is very difficult
• Kafka is not the only framework with this problem
23
24. Developer Integration Testing
• Embedded Kafka from Spring allows a local ’virtual’ Kafka
• Great for unit tests and low level integration tests
24
25. Using Embedded Kafka
• Proceed with caution when trying to ensure execution order
• Most tests will need to pre-load topics with messages
• Quick & dirty, do it statically
• Wrapper for Embedded Kafka with additional utilities
• Based on JUnit ExternalResource
25
26. Using Kafka in Docker for testing
• An alternative to embedded Kafka is to spin up a Docker instance which
act as a ‘Kafka-in-a-box’ – we’re still prototyping this
• Single Docker instance that hosts 1-n Kafka instances and a Zookeeper
instance. This means no need for a Docker swarm
• Start Docker with a Maven exec on pre-integration tests
• Start Docker programmatically on test set up using our JDock utility. This
is more configurable
• This approach is better for NFR & resiliency testing than embedded
Kafka
26
27. Caching Problem
• Source Topic and Recovery Topic have Same Number of Partitions
• Data with Same Key Needs to be in Same Partition
• Recovery Topic is Compacting
• Only the Latest Data for a Given Key is Needed
Flow
1. Microservices Subscribe with Same Consumer Group to Source Topic
• Rebalance Operation Dynamically Assigns Partitions Evenly
2. Microservice Manually Assigns to Same Partitions in Recovery Topic
3. Microservice Clears Cache
4. Microservice Loads All Aggregated Data in Recovery Topic to Cache
5. Microservice Consumes Data from Source Topic
• MD5 Check Ignores any Duplicate Data
6. Consumed Data is Aggregated with that in the Cache
7. Aggregated Data is Stored in Recovery Topic
8. Aggregated Data is Sent to Destination Topic
28. Considerations
• SLA - 1 second for message end to end
• Message time for each microservice is much less
• Failover and Scaling
• Rebalancing
• Time to Load Cache
• Message ordering
• Idempotent
• No duplicates
• Dismissed Solutions
• Dual Running Stacks
• Kafka-Streams - standby replicas (only for failover)
29. Revised Kafka Only Solution
• Recovery Offset Topic has Same Number of Partitions as the
Recovery Topic
• When Data is Stored in the Recovery Topic for a Given Key
• The Offset of that Data in the Recovery Topic is Stored in the
Recovery Offset Topic with the Same Key
• On a Rebalance Operation the Microservice Loads Only the Data in
the Recovery Offset Topic
• A Much Smaller Set of Data (Essentially an Index)
• When the Microservice Consumes Data from the Source Topic
• The Data that it needs to be Aggregated With in the Recovery
Topic is Lazily Retrieved Directly using the Cached Offset to
the Cache
30. With Cassandra Solution
• Aggregated Data is Stored in Cassandra (Key-Value Store)
• No Data is loaded on a Rebalance Operation
• When the Microservice Consumes Data from the Source Topic
• The Data that it needs to be Aggregated With in
Cassandra is Lazily Retrieved to the Cache
Comparison
• Revised Kafka and Cassandra Solution have comparable
performance
• Cassandra Solution Introduces Another Technology
• Cassandra Solution Is Less Complex
Enhancements
• Sticky Assignor (Partition Assignor)
• Preserves as many existing partition assignments as
possible on a rebalance
• Transactions
• Exactly Once Message Processing
31. Topic Configuration
• Partitions: Kafka writes messages to a predetermined number of
partitions, only one consumer can read from each partition at a time, so
you need to consider the number of consumers you have
• Replication: How durable do you need it to be?
• Retention: How long do you want to keep messages for?
• Compaction: How many updates on specific pieces of data do you need
to keep?
31
32. Operational Management
• Operationally, Kafka can fall between the cracks - DBAs & SysAdmin
teams generally won’t want to get involved in the configuration
• Kafka is highly configurable – this is great if you know what it all does
• In the early days many of these configurable fields changed between
versions. It made it difficult to tune Kafka to optimal performance.
• Configuration is heavily dependent on use case. Many settings are also
inter-dependent.
32
33. Summary
• Getting Kafka right is not a one-size-fits-all, you must consider your use
case, both developmentally and operationally
• Building systems with Kafka can be done without a lot of prior expertise
• You will need to refactor, it’s a trial and error approach
• Don’t be afraid to get it wrong
• Don’t assume that your use case has a well established best practice
• Remember to focus on the NFRs as well as the functional requirements