SlideShare a Scribd company logo
1
Timo Walther
Apache Flink PMC
@twalthr
Flink Forward @ San Francisco - April 11th, 2017
Table & SQL API
unified APIs for batch and stream processing
Motivation
2
DataStream API is great…
3
 Very expressive stream processing
• Transform data, update state, define windows, aggregate, etc.
 Highly customizable windowing logic
• Assigners, Triggers, Evictors, Lateness
 Asynchronous I/O
• Improve communication to external systems
 Low-level Operations
• ProcessFunction gives access to timestamps and timers
… but it is not for Everyone!
4
 Writing DataStream programs is not always easy
• Stream processing technology spreads rapidly
• New streaming concepts (time, state, windows, ...)
 Requires knowledge & skill
• Continous applications have special requirements
• Programming experience (Java / Scala)
 Users want to focus on their business logic

Recommended for you

Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming Systems

Current 2022 talk. Speaker: Yingjun Wu Title: Rethinking State Management in Cloud-Native Streaming Systems. Abstract: Stream processing is becoming increasingly essential for extracting business value from data in real-time. To achieve strict user-defined SLAs under constantly changing workloads, modern streaming systems have started taking advantage of the cloud for scalable and resilient resources. New demand opens new opportunities and challenges for state management, which is at the core of streaming systems. Existing approaches typically use embedded key-value storage so that each worker can access it locally to achieve high performance. However, it requires an external durable file system for checkpointing, is complicated and time-consuming to redistribute state during scaling and migration, and is prone to performance throttling. Therefore, we propose shared storage based on LSM-tree. State gets stored at cloud object storage and seamlessly makes itself durable, and the high bandwidth of cloud storage enables fast recovery. The location of a partition of the state decouples with compute nodes thus making scaling straightforward and more efficient. Compaction in this shared LSM-tree is now globally coordinated with opportunistic serverless boosting instead of relying on individual compute nodes. We design a streaming-aware compaction and caching strategy to achieve smoother and better end-to-end performance.

stream processingdatabasecloud
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode

Flink Forward San Francisco 2022. Resource Elasticity is a frequently requested feature in Apache Flink: Users want to be able to easily adjust their clusters to changing workloads for resource efficiency and cost saving reasons. In Flink 1.13, the initial implementation of Reactive Mode was introduced, later releases added more improvements to make the feature production ready. In this talk, we’ll explain scenarios to deploy Reactive Mode to various environments to achieve autoscaling and resource elasticity. We’ll discuss the constraints to consider when planning to use this feature, and also potential improvements from the Flink roadmap. For those interested in the internals of Flink, we’ll also briefly explain how the feature is implemented, and if time permits, conclude with a short demo. by Robert Metzger

stream processingbig dataapache flink
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu

During last two major versions (1.9 & 1.10), Apache Flink community spent lots of effort to improve the architecture for further unified batch & streaming processing. One example for that is Flink SQL added the ability to support multiple SQL planners under the same API. This talk will first discuss the motivation behind these movements, but more importantly will have a deep dive into Flink SQL. The presentation shows the unified architecture to handle streaming and batch queries and explain how Flink translates queries into the relational expressions, leverages Apache Calcite to optimize them, and generates efficient runtime code for execution. Besides, this talk will also describe the lifetime of a query in detail, how optimizer improve the plan based on relational node patterns, how Flink leverages binary data format for its basic data structure, and how does certain operator works. This would give audience better understanding of Flink SQL internals.

alibabaapache flinkstream processing
Why not a Relational API?
5
 Relational API is declarative
• User says what is needed, system decides how to compute it
 Queries can be effectively optimized
• Less black-boxes, well-researched field
 Queries are efficiently executed
• Let Flink handle state, time, and common mistakes
 ”Everybody” knows and uses SQL!
Goals
 Easy, declarative, and concise relational API
 Tool for a wide range of use cases
 Relational API as a unifying layer
• Queries on batch tables terminate and produce a finite result
• Queries on streaming tables run continuously and produce result
stream
 Same syntax & semantics for both queries
6
Table API & SQL
7
Table API & SQL
 Flink features two relational APIs
• Table API: LINQ-style API for Java & Scala (since Flink 0.9.0)
• SQL: Standard SQL (since Flink 1.1.0)
8
DataSet API DataStream API
Table API
SQL
Flink Dataflow Runtime

Recommended for you

Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator

Flink Forward San Francisco 2022. The Apache Flink Kubernetes Operator provides a consistent approach to manage Flink applications automatically, without any human interaction, by extending the Kubernetes API. Given the increasing adoption of Kubernetes based Flink deployments the community has been working on a Kubernetes native solution as part of Flink that can benefit from the rich experience of community members and ultimately make Flink easier to adopt. In this talk we give a technical introduction to the Flink Kubernetes Operator and demonstrate the core features and use-cases through in-depth examples." by Thomas Weise

stream processingbig dataapache flink
Spring batch
Spring batchSpring batch
Spring batch

Spring Batch is a framework for writing batch jobs that can run as scheduled or on-demand processes without user interaction. It provides reusable components for connecting to databases or other systems, processing and transforming data in chunks, and writing output. The basic architecture includes a job launcher, job made of steps, and components for reading input, processing it, and writing output in chunks. Spring Batch Admin provides a web-based interface for monitoring and managing batch jobs.

springspring-batch
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight Overview

Arrow Flight is a proposed RPC layer for Apache Arrow that allows for efficient transfer of Arrow record batches between systems. It uses GRPC as the foundation to define streams of Arrow data that can be consumed in parallel across locations. Arrow Flight supports custom actions that can be used to build services on top of the generic API. By extending GRPC, Arrow Flight aims to simplify the creation of data applications while enabling high performance data transfer and locality awareness.

apache arrowin-memorygrpc
Table API & SQL Example
9
val tEnv = TableEnvironment.getTableEnvironment(env)
// configure your data source
val customerSource = CsvTableSource.builder()
.path("/path/to/customer_data.csv")
.field("name", Types.STRING).field("prefs", Types.STRING)
.build()
// register as a table
tEnv.registerTableSource(”cust", customerSource)
// define your table program
val table = tEnv.scan("cust").select('name.lowerCase(), myParser('prefs))
val table = tEnv.sql("SELECT LOWER(name), myParser(prefs) FROM cust")
// convert
val ds: DataStream[Customer] = table.toDataStream[Customer]
Windowing in Table API
10
val sensorData: DataStream[(String, Long, Double)] = ???
// convert DataStream into Table
val sensorTable: Table = sensorData
.toTable(tableEnv, 'location, 'rowtime, 'tempF)
// define query on Table
val avgTempCTable: Table = sensorTable
.window(Tumble over 1.day on 'rowtime as 'w)
.groupBy('location, ’w)
.select('w.start as 'day,
'location,
(('tempF.avg - 32) * 0.556) as 'avgTempC)
.where('location like "room%")
Windowing in SQL
11
val sensorData: DataStream[(String, Long, Double)] = ???
// register DataStream
tableEnv.registerDataStream(
"sensorData", sensorData, 'location, 'rowtime, 'tempF)
// query registered Table
val avgTempCTable: Table = tableEnv.sql("""
SELECT TUMBLE_START(TUMBLE(time, INTERVAL '1' DAY) AS day,
location,
AVG((tempF - 32) * 0.556) AS avgTempC
FROM sensorData
WHERE location LIKE 'room%’
GROUP BY location, TUMBLE(time, INTERVAL '1' DAY)
""")
Architecture
2 APIs [SQL, Table API]
*
2 backends [DataStream, DataSet]
=
4 different translation paths?
12

Recommended for you

Spring Framework - AOP
Spring Framework - AOPSpring Framework - AOP
Spring Framework - AOP

This document discusses Aspect Oriented Programming (AOP) using the Spring Framework. It defines AOP as a programming paradigm that extends OOP by enabling modularization of crosscutting concerns. It then discusses how AOP addresses common crosscutting concerns like logging, validation, caching, and transactions through aspects, pointcuts, and advice. It also compares Spring AOP and AspectJ, and shows how to implement AOP in Spring using annotations or XML.

aopspring frameworkaspect
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...

The document discusses code optimization techniques in Spark SQL's Catalyst optimizer. It describes how function outlining can improve performance of generated Java code by splitting large routines into smaller ones. The document outlines a Spark SQL query optimization case study where outlining a 300+ line routine from Catalyst code generation improved query performance by up to 19% on a Power8 cluster. Overall, the document examines how function outlining and other code generation optimizations in Catalyst can help the Java JIT compiler better optimize Spark SQL queries.

apache sparksparkaisummit
How to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized OptimizationsHow to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized Optimizations

There are a growing set of optimization mechanisms that allow you to achieve competitive SQL performance. Spark has extension points that help third parties to add customizations and optimizations without needing these optimizations to be merged into Apache Spark. This is very powerful and helps extensibility. We have added some enhancements to the existing extension points framework to enable some fine grained control. This talk will be a deep dive at the extension points that is available in Spark today. We will also talk about the enhancements to this API that we developed to help make this API more powerful. This talk will be of benefit to developers who are looking to customize Spark in their deployments.

Architecture
13
DataSet Rules
DataSet PlanDataSet DataStreamDataStream Plan
DataStream Rules
Calcite Catalog
Calcite Logical Plan
Calcite Optimizer
Calcite
Parser & Validator
Table API SQL API
DataSet
Table
Sources
DataStream
Table API Validator
Architecture
14
DataSet Rules
DataSet PlanDataSet DataStreamDataStream Plan
DataStream Rules
Calcite Catalog
Calcite Logical Plan
Calcite Optimizer
Calcite
Parser & Validator
Table API SQL API
DataSet
Table
Sources
DataStream
Table API Validator
Architecture
15
DataSet Rules
DataSet PlanDataSet DataStreamDataStream Plan
DataStream Rules
Calcite Catalog
Calcite Logical Plan
Calcite Optimizer
Calcite
Parser & Validator
Table API SQL API
DataSet
Table
Sources
DataStream
Table API Validator
Architecture
16
DataSet Rules
DataSet PlanDataSet DataStreamDataStream Plan
DataStream Rules
Calcite Catalog
Calcite Logical Plan
Calcite Optimizer
Calcite
Parser & Validator
Table API SQL API
DataSet
Table
Sources
DataStream
Table API Validator

Recommended for you

Struts notes
Struts notesStruts notes
Struts notes

The document discusses Struts framework and internationalization (I18N) applications. Some key points: 1. Struts is a MVC framework that simplifies development of web applications. It provides components like ActionServlet and tag libraries. 2. I18N applications display output based on the user's locale/language. This is achieved using properties files with language-specific translations and the ResourceBundle class. 3. In Struts, properties files are configured in struts-config.xml and accessed in JSPs using the <message> tag. Keys not found will result in errors unless null=false is specified.

OpenJDK Concurrent Collectors
OpenJDK Concurrent CollectorsOpenJDK Concurrent Collectors
OpenJDK Concurrent Collectors

This document provides an overview and comparison of the algorithms, phases, and commonalities of modern concurrent garbage collectors in HotSpot, including G1, Shenandoah, and Z GC. It begins with laying the groundwork on stop-the-world vs concurrent collection and heap layout. It then introduces the key differences between the three collectors in their marking, barrier, and compaction approaches. The goal of the document is to provide a technical introduction and high-level differences between these concurrent garbage collectors.

openjdkgcshotspot
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...

Flink Forward San Francisco 2022. Being in the payments space, Stripe requires strict correctness and freshness guarantees. We rely on Flink as the natural solution for delivering on this in support of our Change Data Capture (CDC) infrastructure. We heavily rely on CDC as a tool for capturing data change streams from our databases without critically impacting database reliability, scalability, and maintainability. Data derived from these streams is used broadly across the business and powers many of our critical financial reporting systems totalling over $640 Billion in payment volume annually. We use many components of Flink’s flexible DataStream API to perform aggregations and abstract away the complexities of stream processing from our downstreams. In this talk, we’ll walk through our experience from the very beginning to what we have in production today. We’ll share stories around the technical details and trade-offs we encountered along the way. by Jeff Chao

stream processingbig dataapache flink
Translation to Logical Plan
17
sensorTable
.window(Tumble over 1.day on 'rowtime as 'w)
.groupBy('location, ’w)
.select(
'w.start as 'day,
'location,
(('tempF.avg - 32) *
0.556) as 'avgTempC)
.where('location like "room%")
Catalog Node
Window Aggregate
Project
Filter
Logical Table Scan
Logical Window
Aggregate
Logical Project
Logical Filter
Table Nodes Calcite Logical Plan
Table API Validation
Translation
Translation to DataStream Plan
18
Logical Table Scan
Logical Window
Aggregate
Logical Project
Logical Filter
Calcite Logical Plan
Logical Table Scan
Logical Window
Aggregate
Logical Calc
Optimized Plan
DataStream Scan
DataStream Calc
DataStream
Aggregate
DataStream Plan
Optimize
Transform
Translation to Flink Program
19
DataStream Scan
DataStream Calc
DataStream
Aggregate
DataStream Plan
(Forwarding)
FlatMap Function
Aggregate & Window
Function
DataStream Program
Translate &
Code-generate
Current State (in master)
 Batch support
• Selection, Projection, Sort, Inner & Outer Joins, Set operations
• Group-Windows for Slide, Tumble, Session
 Streaming support
• Selection, Projection, Union
• Group-Windows for Slide, Tumble, Session
• Different SQL OVER-Windows (RANGE/ROWS)
 UDFs, UDTFs, custom rules
20

Recommended for you

React workshop
React workshopReact workshop
React workshop

Tutorial Videos: https://www.youtube.com/playlist?list=PLD8nQCAhR3tQ7KXnvIk_v_SLK-Fb2y_k_ Day 1 : Introduction to React, Babel and Webpack Prerequisites of starting the workshop ( Basic understanding of Node & Express ) What is Virtual DOM? What is React and why should we use it? Install and set up React: a-Using create-react-app b-From scratch using Babel and Webpack. We will use Webpack Dev Server. Day 2 : React Basic Concepts Types of Components: Class-based and Functional based Components Use of JSX Parent, Child, and Nested Components Difference between State and Props Create and Handle Routes Component Lifecycle Methods Create a form and handling form inputs Use of arrow functions and Spread Operator Day 3: Advanced Concepts in React Use of Refs What are Higher Order Components( HOC )? How to use HOC Understanding Context in React

reactcomponent
React state management with Redux and MobX
React state management with Redux and MobXReact state management with Redux and MobX
React state management with Redux and MobX

The presentation for the Shift Split 2017 conf workshop. Example project repo: https://github.com/infinum/shift-2017

reactwebmobx
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)

Speaker: Timo Walther (https://www.linkedin.com/in/twalthr/) Video: https://www.youtube.com/watch?v=Ych5bbmDIoA Talk presented during Bangalore Kafka group's stream processing meetup at Hotstar https://www.meetup.com/Bangalore-Apache-Kafka-Group/events/265285812/

streamprocessingapache
Use Cases for Streaming SQL
 Continuous ETL & Data Import
 Live Dashboards & Reports
21
Outlook: Dynamic Tables
22
Dynamic Tables Model
 Dynamic tables change over time
 Dynamic tables are treated like static batch tables
• Dynamic tables are queried with standard SQL / Table API
• Every query returns another Dynamic Table
 “Stream / Table Duality”
• Stream ←→ Dynamic Table
conversions without information loss
23
Stream to Dynamic Table
 Append Mode:
 Update Mode:
24

Recommended for you

Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest

Flink Forward San Francisco 2022. Pinterest is a visual discovery engine that serves over 433MM users. Stream processing allows us to unlock value from realtime data for pinners. At Pinterest, we adopt Flink as the unified streaming processing engine. In this talk, we will share our journey in building a stream processing platform with Flink and how we onboarding critical use cases to the platform. Pinterest has supported 90+near realtime streaming applications. We will cover the problem statement, how we evaluate potential solutions and our decision to build the framework. by Rainie Li & Kanchi Masalia

apache flinkstream processingbig data
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO

Although NVMe has been more and more popular these years, a large amount of HDD are still widely used in super-large scale big data clusters. In a EB-level data platform, IO(including decompression and decode) cost contributes a large proportion of Spark jobs’ cost. In another word, IO operation is worth optimizing. In ByteDancen, we do a series of IO optimization to improve performance, including parallel read and asynchronized shuffle. Firstly we implement file level parallel read to improve performance when there are a lot of small files. Secondly, we design row group level parallel read to accelerate queries for big-file scenario. Thirdly, implement asynchronized spill to improve job peformance. Besides, we design parquet column family, which will split a table into a few column families and different column family will be in different Parquets files. Different column family can be read in parallel, so the read performance is much higher than the existing approach. In our practice, the end to end performance is improved by 5% to 30% In this talk, I will illustrate how we implement these features and how they accelerate Apache Spark jobs.

Timo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processingTimo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processing

SQL is undoubtedly the most widely used language for data analytics. It is declarative and can be optimized and efficiently executed by most query processors. Therefore the community has made effort to add relational APIs to Apache Flink, a standard SQL API and a language-integrated Table API. Both APIs are semantically compatible and share the same optimization and execution path based on Apache Calcite. Since Flink supports both stream and batch processing and many use cases require both kinds of processing, we aim for a unified relational layer. In this talk we will look at the current API capabilities, find out what's under the hood of Flink’s relational APIs, and give an outlook for future features such as dynamic tables, Flink's way how streams are converted into tables and vice versa leveraging the stream-table duality.

big dataflinkmeetup
Querying Dynamic Tables
 Dynamic tables change over time
• A[t]: Table A at specific point in time t
 Dynamic tables are queried with relational semantics
• Result of a query changes as input table changes
• q(A[t]): Evaluate query q on table A at time t
 Query result is continuously updated as t progresses
• Similar to maintaining a materialized view
• t is current event time
25
Querying a Dynamic Table
26
Querying a Dynamic Table
27
Querying a Dynamic Table
 Can we run any query on Dynamic Tables? No!
 State may not grow infinitely as more data arrives
• Set clean-up timeout or key constraints.
 Input may only trigger partial re-computation
 Queries with possibly unbounded state or computation
are rejected
28

Recommended for you

Apache Flink's Table & SQL API - unified APIs for batch and stream processing
Apache Flink's Table & SQL API - unified APIs for batch and stream processingApache Flink's Table & SQL API - unified APIs for batch and stream processing
Apache Flink's Table & SQL API - unified APIs for batch and stream processing

SQL is undoubtedly the most widely used language for data analytics. It is declarative and can be optimized and efficiently executed by most query processors. Therefore the community has made effort to add relational APIs to Apache Flink, a standard SQL API and a language-integrated Table API. Both APIs are semantically compatible and share the same optimization and execution path based on Apache Calcite. Since Flink supports both stream and batch processing and many use cases require both kinds of processing, we aim for a unified relational layer. In this talk we will look at the current API capabilities, find out what's under the hood of Flink’s relational APIs, and give an outlook for future features such as dynamic tables, Flink's way how streams are converted into tables and vice versa leveraging the stream-table duality.

flinksqlbatch processing
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams

SQL can be used to query both streaming and batch data. Apache Flink and Apache Calcite enable SQL queries on streaming data. Flink uses its Table API and integrates with Calcite to translate SQL queries into dataflow programs. This allows standard SQL to be used for both traditional batch analytics on finite datasets and stream analytics producing continuous results from infinite data streams. Queries are executed continuously by applying operators within windows to subsets of streaming data.

batch processingstream processingquery optimization
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink

Fabian Hueske presented on stream analytics using SQL on Apache Flink. Flink provides a scalable platform for stream processing that is fast, accurate, and reliable. Its relational APIs allow querying both batch and streaming data using standard SQL or a LINQ-style Table API. Queries on streaming data produce continuously updating results. Windows can be used to compute aggregates over tumbling time intervals. The dynamic tables representing streaming data can be converted to output streams encoding updates as insertions and deletions. While not all queries can be supported, techniques like limiting state size allow bounding computational resources. Use cases like continuous ETL, dashboards, and event-driven architectures were discussed.

conferenceopen sourceevent processing
Dynamic Table to Stream
 Convert Dynamic Table modifications into stream
messages
 Similar to database logging techniques
• Undo: previous value of a modified element
• Redo: new value of a modified element
• Undo+Redo: old and the new value of a changed element
 For Dynamic Tables: Redo or Undo+Redo
29
Dynamic Table to Stream
 Undo+Redo Stream (because A is in Append Mode):
30
Dynamic Table to Stream
 Redo Stream (because A is in Update Mode):
31
Result computation & refinement
32
First result
(end – x)
Last result
(end + x)
State is purged.
Late updates
(on new data)
Update rate
(every x)
Complete
result
(end + x)
Complete result can be computed
(end)

Recommended for you

Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink

Apache Flink's DataStream API is very expressive and gives users precise control over time and state. However, many applications do not require this level of expressiveness and can be implemented more concisely and easily with a domain-specific API. SQL is undoubtedly the most widely used language for data processing but usually applied in the domain of batch processing. Apache Flink features two relational APIs for unified stream and batch processing, the Table API, a language-integrated relational query API for Scala and Java, and SQL. A Table API or SQL query computes the same result regardless whether it is evaluated on a static file or on a Kafka topic. While Flink evaluates queries on batch input like a conventional query engine, queries on streaming input are continuously processed and their results constantly updated and refined. In this talk we present Flink’s unified relational APIs, show how streaming SQL queries are processed, and discuss exciting new use-cases.

sqlstreaming sqlstream processing
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...

Apache Flink's DataStream API is very expressive and gives users precise control over time and state. However, many applications do not require this level of expressiveness and can be implemented more concisely and easily with a domain-specific API. SQL is undoubtedly the most widely used language for data processing but usually applied in the domain of batch processing. Apache Flink features two relational APIs for unified stream and batch processing, the Table API, a language-integrated relational query API for Scala and Java, and SQL. A Table API or SQL query computes the same result regardless whether it is evaluated on a static file or on a Kafka topic. While Flink evaluates queries on batch input like a conventional query engine, queries on streaming input are continuously processed and their results constantly updated and refined. In this talk we present Flink’s unified relational APIs, show how streaming SQL queries are processed, and discuss exciting new use-cases.

streamprocessingapache flinkflink
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup

Slides of my Apache Flink talk "Big Things" meetup at Proofpoint and "Apache Flink" meetup at Clicktale in January 2018

apache flinkstream processinganalytics
Contributions welcome!
 Huge interest and many contributors
• Adding more window operators
• Introducing dynamic tables
 And there is a lot more to do
• New operators and features for streaming and batch
• Performance improvements
• Tooling and integration
 Try it out, give feedback, and start contributing!
33
3
Thank you!
@twalthr
@ApacheFlink
@dataArtisans

More Related Content

What's hot

Building a REST Service in minutes with Spring Boot
Building a REST Service in minutes with Spring BootBuilding a REST Service in minutes with Spring Boot
Building a REST Service in minutes with Spring Boot
Omri Spector
 
Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Bucketing 2.0: Improve Spark SQL Performance by Removing ShuffleBucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Databricks
 
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in FlinkMaxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Flink Forward
 
Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming Systems
Yingjun Wu
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
Flink Forward
��
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
Flink Forward
 
Spring batch
Spring batchSpring batch
Spring batch
nishasowdri
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight Overview
Jacques Nadeau
 
Spring Framework - AOP
Spring Framework - AOPSpring Framework - AOP
Spring Framework - AOP
Dzmitry Naskou
 
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Databricks
 
How to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized OptimizationsHow to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized Optimizations
Databricks
 
Struts notes
Struts notesStruts notes
Struts notes
Rajeev Uppala
 
OpenJDK Concurrent Collectors
OpenJDK Concurrent CollectorsOpenJDK Concurrent Collectors
OpenJDK Concurrent Collectors
Monica Beckwith
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Flink Forward
 
React workshop
React workshopReact workshop
React workshop
Imran Sayed
 
React state management with Redux and MobX
React state management with Redux and MobXReact state management with Redux and MobX
React state management with Redux and MobX
Darko Kukovec
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
KafkaZone
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
Databricks
 

What's hot (20)

Building a REST Service in minutes with Spring Boot
Building a REST Service in minutes with Spring BootBuilding a REST Service in minutes with Spring Boot
Building a REST Service in minutes with Spring Boot
 
Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Bucketing 2.0: Improve Spark SQL Performance by Removing ShuffleBucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
 
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in FlinkMaxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
 
Rethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming SystemsRethinking State Management in Cloud-Native Streaming Systems
Rethinking State Management in Cloud-Native Streaming Systems
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark WuVirtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
Virtual Flink Forward 2020: A deep dive into Flink SQL - Jark Wu
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Spring batch
Spring batchSpring batch
Spring batch
 
Apache Arrow Flight Overview
Apache Arrow Flight OverviewApache Arrow Flight Overview
Apache Arrow Flight Overview
 
Spring Framework - AOP
Spring Framework - AOPSpring Framework - AOP
Spring Framework - AOP
 
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
Spark SQL Catalyst Code Optimization using Function Outlining with Kavana Bha...
 
How to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized OptimizationsHow to Extend Apache Spark with Customized Optimizations
How to Extend Apache Spark with Customized Optimizations
 
Struts notes
Struts notesStruts notes
Struts notes
 
OpenJDK Concurrent Collectors
OpenJDK Concurrent CollectorsOpenJDK Concurrent Collectors
OpenJDK Concurrent Collectors
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
React workshop
React workshopReact workshop
React workshop
 
React state management with Redux and MobX
React state management with Redux and MobXReact state management with Redux and MobX
React state management with Redux and MobX
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 

Similar to Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for batch and stream processing

Timo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processingTimo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
Ververica
 
Apache Flink's Table & SQL API - unified APIs for batch and stream processing
Apache Flink's Table & SQL API - unified APIs for batch and stream processingApache Flink's Table & SQL API - unified APIs for batch and stream processing
Apache Flink's Table & SQL API - unified APIs for batch and stream processing
Timo Walther
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
Radu Tudoran
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink
Fabian Hueske
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup
Robert Metzger
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
Stephan Ewen
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Julian Hyde
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
Julian Hyde
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
Petr Zapletal
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are important
Paris Carbone
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+Tables
C4Media
 
Flink's SQL Engine: Let's Open the Engine Room!
Flink's SQL Engine: Let's Open the Engine Room!Flink's SQL Engine: Let's Open the Engine Room!
Flink's SQL Engine: Let's Open the Engine Room!
HostedbyConfluent
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Flink Forward
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
Fabian Hueske
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes
 
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Spark Summit
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
Petr Zapletal
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
Miklos Christine
 

Similar to Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for batch and stream processing (20)

Timo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processingTimo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
 
Apache Flink's Table & SQL API - unified APIs for batch and stream processing
Apache Flink's Table & SQL API - unified APIs for batch and stream processingApache Flink's Table & SQL API - unified APIs for batch and stream processing
Apache Flink's Table & SQL API - unified APIs for batch and stream processing
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
 
Stream Analytics with SQL on Apache Flink
 Stream Analytics with SQL on Apache Flink Stream Analytics with SQL on Apache Flink
Stream Analytics with SQL on Apache Flink
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are important
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+Tables
 
Flink's SQL Engine: Let's Open the Engine Room!
Flink's SQL Engine: Let's Open the Engine Room!Flink's SQL Engine: Let's Open the Engine Room!
Flink's SQL Engine: Let's Open the Engine Room!
 
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIsFabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
Fabian Hueske - Taking a look under the hood of Apache Flink’s relational APIs
 
Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.Taking a look under the hood of Apache Flink's relational APIs.
Taking a look under the hood of Apache Flink's relational APIs.
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
Distributed Real-Time Stream Processing: Why and How: Spark Summit East talk ...
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
 

More from Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Flink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
Flink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
Flink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Flink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
Flink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
Flink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Flink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
Flink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
Flink Forward
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
Flink Forward
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
Flink Forward
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
Flink Forward
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
Flink Forward
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
Flink Forward
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
Flink Forward
 

More from Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
 

Recently uploaded

Niagara College degree offer diploma Transcript
Niagara College  degree offer diploma TranscriptNiagara College  degree offer diploma Transcript
Niagara College degree offer diploma Transcript
taqyea
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
kihus38
 
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
SARITA PANDEY
 
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdfOrange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
RealDarrah
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
manjukaushik328
 
Seamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send MoneySeamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send Money
gargtinna79
 
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
Amazon Web Services Korea
 
LLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptxLLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptx
Jyotishko Biswas
 
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
Delhi Call Girls
 
Simon Fraser University degree offer diploma Transcript
Simon Fraser University  degree offer diploma TranscriptSimon Fraser University  degree offer diploma Transcript
Simon Fraser University degree offer diploma Transcript
taqyea
 
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model SafeDelhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
dipti singh$A17
 
AIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on AzureAIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on Azure
SanelaNikodinoska1
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
javier ramirez
 
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
1258strict
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
punebabes1
 
( Call  ) Girls Nehru Place 9711199012 Beautiful Girls
( Call  ) Girls Nehru Place 9711199012 Beautiful Girls( Call  ) Girls Nehru Place 9711199012 Beautiful Girls
( Call  ) Girls Nehru Place 9711199012 Beautiful Girls
Nikita Singh$A17
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
#kalyanmatkaresult #dpboss #kalyanmatka #satta #matka #sattamatka
 
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
Amazon Web Services Korea
 

Recently uploaded (20)

Niagara College degree offer diploma Transcript
Niagara College  degree offer diploma TranscriptNiagara College  degree offer diploma Transcript
Niagara College degree offer diploma Transcript
 
Introduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdfIntroduction to the Red Hat Portfolio.pdf
Introduction to the Red Hat Portfolio.pdf
 
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
@Call @Girls Bandra phone 9920874524 You Are Serach A Beautyfull Dolle come here
 
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdfOrange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
Orange Yellow Gradient Aesthetic Y2K Creative Portfolio Presentation -3.pdf
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
 
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
@Call @Girls Kolkata 0000000000 Shivani Beautiful Girl any Time
 
Seamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send MoneySeamlessly Pay Online, Pay In Stores or Send Money
Seamlessly Pay Online, Pay In Stores or Send Money
 
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
[D3T1S04] Aurora PostgreSQL performance monitoring and troubleshooting by use...
 
LLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptxLLM powered Contract Compliance Application.pptx
LLM powered Contract Compliance Application.pptx
 
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
@Call @Girls in Kolkata 💋😂 XXXXXXXX 👄👄 Hello My name Is Kamli I am Here meet you
 
Simon Fraser University degree offer diploma Transcript
Simon Fraser University  degree offer diploma TranscriptSimon Fraser University  degree offer diploma Transcript
Simon Fraser University degree offer diploma Transcript
 
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model SafeDelhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
Delhi @ℂall @Girls ꧁❤ 9711199012 ❤꧂Glamorous sonam Mehra Top Model Safe
 
AIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on AzureAIRLINE_SATISFACTION_Data Science Solution on Azure
AIRLINE_SATISFACTION_Data Science Solution on Azure
 
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
Cómo hemos implementado semántica de "Exactly Once" en nuestra base de datos ...
 
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
Mumbai Central @Call @Girls 🛴 9930687706 🛴 Aaradhaya Best High Class Mumbai A...
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
 
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
Madurai @Call @Girls Whatsapp 0000000000 With High Profile Offer 25%
 
( Call  ) Girls Nehru Place 9711199012 Beautiful Girls
( Call  ) Girls Nehru Place 9711199012 Beautiful Girls( Call  ) Girls Nehru Place 9711199012 Beautiful Girls
( Call  ) Girls Nehru Place 9711199012 Beautiful Girls
 
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
❻❸❼⓿❽❻❷⓿⓿❼ SATTA MATKA DPBOSS KALYAN FAST RESULTS CHART KALYAN MATKA MATKA RE...
 
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
[D3T2S03] Data&AI Roadshow 2024 - Amazon DocumentDB 실습
 

Flink Forward SF 2017: Timo Walther - Table & SQL API – unified APIs for batch and stream processing

  • 1. 1 Timo Walther Apache Flink PMC @twalthr Flink Forward @ San Francisco - April 11th, 2017 Table & SQL API unified APIs for batch and stream processing
  • 3. DataStream API is great… 3  Very expressive stream processing • Transform data, update state, define windows, aggregate, etc.  Highly customizable windowing logic • Assigners, Triggers, Evictors, Lateness  Asynchronous I/O • Improve communication to external systems  Low-level Operations • ProcessFunction gives access to timestamps and timers
  • 4. … but it is not for Everyone! 4  Writing DataStream programs is not always easy • Stream processing technology spreads rapidly • New streaming concepts (time, state, windows, ...)  Requires knowledge & skill • Continous applications have special requirements • Programming experience (Java / Scala)  Users want to focus on their business logic
  • 5. Why not a Relational API? 5  Relational API is declarative • User says what is needed, system decides how to compute it  Queries can be effectively optimized • Less black-boxes, well-researched field  Queries are efficiently executed • Let Flink handle state, time, and common mistakes  ”Everybody” knows and uses SQL!
  • 6. Goals  Easy, declarative, and concise relational API  Tool for a wide range of use cases  Relational API as a unifying layer • Queries on batch tables terminate and produce a finite result • Queries on streaming tables run continuously and produce result stream  Same syntax & semantics for both queries 6
  • 7. Table API & SQL 7
  • 8. Table API & SQL  Flink features two relational APIs • Table API: LINQ-style API for Java & Scala (since Flink 0.9.0) • SQL: Standard SQL (since Flink 1.1.0) 8 DataSet API DataStream API Table API SQL Flink Dataflow Runtime
  • 9. Table API & SQL Example 9 val tEnv = TableEnvironment.getTableEnvironment(env) // configure your data source val customerSource = CsvTableSource.builder() .path("/path/to/customer_data.csv") .field("name", Types.STRING).field("prefs", Types.STRING) .build() // register as a table tEnv.registerTableSource(”cust", customerSource) // define your table program val table = tEnv.scan("cust").select('name.lowerCase(), myParser('prefs)) val table = tEnv.sql("SELECT LOWER(name), myParser(prefs) FROM cust") // convert val ds: DataStream[Customer] = table.toDataStream[Customer]
  • 10. Windowing in Table API 10 val sensorData: DataStream[(String, Long, Double)] = ??? // convert DataStream into Table val sensorTable: Table = sensorData .toTable(tableEnv, 'location, 'rowtime, 'tempF) // define query on Table val avgTempCTable: Table = sensorTable .window(Tumble over 1.day on 'rowtime as 'w) .groupBy('location, ’w) .select('w.start as 'day, 'location, (('tempF.avg - 32) * 0.556) as 'avgTempC) .where('location like "room%")
  • 11. Windowing in SQL 11 val sensorData: DataStream[(String, Long, Double)] = ??? // register DataStream tableEnv.registerDataStream( "sensorData", sensorData, 'location, 'rowtime, 'tempF) // query registered Table val avgTempCTable: Table = tableEnv.sql(""" SELECT TUMBLE_START(TUMBLE(time, INTERVAL '1' DAY) AS day, location, AVG((tempF - 32) * 0.556) AS avgTempC FROM sensorData WHERE location LIKE 'room%’ GROUP BY location, TUMBLE(time, INTERVAL '1' DAY) """)
  • 12. Architecture 2 APIs [SQL, Table API] * 2 backends [DataStream, DataSet] = 4 different translation paths? 12
  • 13. Architecture 13 DataSet Rules DataSet PlanDataSet DataStreamDataStream Plan DataStream Rules Calcite Catalog Calcite Logical Plan Calcite Optimizer Calcite Parser & Validator Table API SQL API DataSet Table Sources DataStream Table API Validator
  • 14. Architecture 14 DataSet Rules DataSet PlanDataSet DataStreamDataStream Plan DataStream Rules Calcite Catalog Calcite Logical Plan Calcite Optimizer Calcite Parser & Validator Table API SQL API DataSet Table Sources DataStream Table API Validator
  • 15. Architecture 15 DataSet Rules DataSet PlanDataSet DataStreamDataStream Plan DataStream Rules Calcite Catalog Calcite Logical Plan Calcite Optimizer Calcite Parser & Validator Table API SQL API DataSet Table Sources DataStream Table API Validator
  • 16. Architecture 16 DataSet Rules DataSet PlanDataSet DataStreamDataStream Plan DataStream Rules Calcite Catalog Calcite Logical Plan Calcite Optimizer Calcite Parser & Validator Table API SQL API DataSet Table Sources DataStream Table API Validator
  • 17. Translation to Logical Plan 17 sensorTable .window(Tumble over 1.day on 'rowtime as 'w) .groupBy('location, ’w) .select( 'w.start as 'day, 'location, (('tempF.avg - 32) * 0.556) as 'avgTempC) .where('location like "room%") Catalog Node Window Aggregate Project Filter Logical Table Scan Logical Window Aggregate Logical Project Logical Filter Table Nodes Calcite Logical Plan Table API Validation Translation
  • 18. Translation to DataStream Plan 18 Logical Table Scan Logical Window Aggregate Logical Project Logical Filter Calcite Logical Plan Logical Table Scan Logical Window Aggregate Logical Calc Optimized Plan DataStream Scan DataStream Calc DataStream Aggregate DataStream Plan Optimize Transform
  • 19. Translation to Flink Program 19 DataStream Scan DataStream Calc DataStream Aggregate DataStream Plan (Forwarding) FlatMap Function Aggregate & Window Function DataStream Program Translate & Code-generate
  • 20. Current State (in master)  Batch support • Selection, Projection, Sort, Inner & Outer Joins, Set operations • Group-Windows for Slide, Tumble, Session  Streaming support • Selection, Projection, Union • Group-Windows for Slide, Tumble, Session • Different SQL OVER-Windows (RANGE/ROWS)  UDFs, UDTFs, custom rules 20
  • 21. Use Cases for Streaming SQL  Continuous ETL & Data Import  Live Dashboards & Reports 21
  • 23. Dynamic Tables Model  Dynamic tables change over time  Dynamic tables are treated like static batch tables • Dynamic tables are queried with standard SQL / Table API • Every query returns another Dynamic Table  “Stream / Table Duality” • Stream ←→ Dynamic Table conversions without information loss 23
  • 24. Stream to Dynamic Table  Append Mode:  Update Mode: 24
  • 25. Querying Dynamic Tables  Dynamic tables change over time • A[t]: Table A at specific point in time t  Dynamic tables are queried with relational semantics • Result of a query changes as input table changes • q(A[t]): Evaluate query q on table A at time t  Query result is continuously updated as t progresses • Similar to maintaining a materialized view • t is current event time 25
  • 26. Querying a Dynamic Table 26
  • 27. Querying a Dynamic Table 27
  • 28. Querying a Dynamic Table  Can we run any query on Dynamic Tables? No!  State may not grow infinitely as more data arrives • Set clean-up timeout or key constraints.  Input may only trigger partial re-computation  Queries with possibly unbounded state or computation are rejected 28
  • 29. Dynamic Table to Stream  Convert Dynamic Table modifications into stream messages  Similar to database logging techniques • Undo: previous value of a modified element • Redo: new value of a modified element • Undo+Redo: old and the new value of a changed element  For Dynamic Tables: Redo or Undo+Redo 29
  • 30. Dynamic Table to Stream  Undo+Redo Stream (because A is in Append Mode): 30
  • 31. Dynamic Table to Stream  Redo Stream (because A is in Update Mode): 31
  • 32. Result computation & refinement 32 First result (end – x) Last result (end + x) State is purged. Late updates (on new data) Update rate (every x) Complete result (end + x) Complete result can be computed (end)
  • 33. Contributions welcome!  Huge interest and many contributors • Adding more window operators • Introducing dynamic tables  And there is a lot more to do • New operators and features for streaming and batch • Performance improvements • Tooling and integration  Try it out, give feedback, and start contributing! 33

Editor's Notes

  1. DATASTREAM: event-time semantics, stateful exactly-once processing, high throughput & low latency at the same time compute exact and deterministic results in real-time ASYNC: enrich stream events with data stored in a database, communication delay with external system does not dominate the streaming application’s total work
  2. There is a talent gap. SKILL: Memory-bound Handling of timestamps and watermarks in ProcessFunctions API which quickly solves 80% of their use cases where simple tasks can be defined using little code. BUSINESS: No null support, no timestamps, no common tools for string normalization etc.
  3. Users do not specify implementation. UDF: great for expressiveness, bad for optimization - need for manual tuning ProcessFunctions implemented in Flink handle state and time. SQL is the most widely used language for data analytics SQL would make stream processing much more accessible
  4. Flink is a platform for distributed stream and batch data processing users only need to learn a single API a query produces exactly the same result regardless whether its input is static batch data or streaming data
  5. TABLE API: - Language INtegrated Query (LINQ) API - Queries are not embedded as String - Centered around Table objects - Operations are applied on Tables and return a Table SQL: - Standard SQL - Queries are embedded as Strings into programs - Referenced tables must be registered - Queries return a Table object - Integration with Table API Equivalent feature set (at the moment) Table API and SQL can be mixed Both are tightly integrated with Flink’s core API often referred to as the Table API
  6. Works for both batch and stream.
  7. Works for both batch and stream.
  8. Maintain standard SQL Compliant
  9. Apache Calcite is a SQL parsing and query optimizer framework Used by many other projects to parse and optimize SQL queries Apache Drill, Apache Hive, Apache Kylin, Cascading, …
  10. Tables, columns, and data types are stored in a central catalog Tables can be created from DataSets DataStreams TableSources (without going through DataSet/DataStream API)
  11. Table API and SQL queries are translated into common logical plan representation.
  12. Logical plans are translated and optimized depending on execution backend. Plans are transformed into DataSet or DataStream programs.
  13. API calls are translated into logical operators and immediately validated API operators compose a tree Before optimization, the API operator tree is translated into a logical Calcite plan
  14. Calcite provides many optimization rules Custom rules to transform logical nodes into Flink nodes DataSet rules to translate batch queries DataStream rules to translate streaming queries Constant expression reduction
  15. into DataStream or DataSet operators With continous queries in mind! Janino Compiler Framework Arriving data is incrementally aggregated using the given aggregate function. This means that the window function typically has only a single value to process when called.
  16. RANGE UNBOUNDED preceding ROWS BETWEEN 5 preceding AND CURRENT ROW RANGE BETWEEN INTERVAL '1' SECOND preceding AND CURRENT ROW BUT: relational processing of batch is well defined and understood not all relational operators can be naively applied on streams no widely accepted definition of the semantics for relational processing of streaming data all supported operators have in common that they never update result records which have been emitted not an issue for record-at-a-time operators such as projection and filter affects operators that collect and process multiple records: emitted results cannot be updated, late event are discarded
  17. emit data to log-style system where emitted data cannot be updated (results cannot be refined) only append operations, no updates or delete many streaming analytics use cases that need to update results no discarding possible early results needed can be updated and refined analyze and explore streaming data in a real-time fashion
  18. vastly increase the scope of the APIs and the range of supported use cases can be challenging to implement using the DataStream API
  19. It is important to note that this is only the logical model and does not imply how the query is actually executed RETURNS: runs continuously and produces a table that is continuously updated -> Dynamic Table result-updating queries But: we must of course preserve the unified semantics for stream and batch inputs
  20. we have to specify how the records of a stream modify the dynamic table APPEND: each stream record is an insert modification conceptually the dynamic table is ever-growing and infinite in size UPDATE/REPLACE: specify a unique key attribute stream record can represent an insert, update, or delete modification append mode is in fact a special case of update mode
  21. Let’s imagine we take a snapshot of a dynamic table at a specific point in time. Snapshot is like a regular static batch table. PROGRESS: If we repeatedly compute the result of a query on snapshots for progressing points in time, we obtain many static result tables. They are changing over time and effectively constitute a dynamic table. At each point in time t, the result table is equivalent to a batch query on the dynamic table A at time t.
  22. APPENDMODE: query continuously updates result rows that it had previously emitted instead of only adding new rows size of the result table depends on the number of distinct grouping keys of the input table NOTE: As long as it is not emitted these changes are completely internal and not visible to a user. changes materialize when a dynamic table is emitted
  23. In contrast to the result of the first example, the resulting table grows relative to the time. While the non-windowed query (mostly) updates rows of the result table, the windowed aggregation query only appends new rows to the result table.
  24. only those that can be continuously, incrementally, and efficiently computed
  25. Traditional database systems use logs to rebuild tables in case of failures and for replication. UNDO logs contain the previous value of a modified element to revert incomplete transactions REDO logs contain the new value of a modified element to redo lost changes of completed transactions UNDO/REDO logs contain the old and the new value of a changed element
  26. An insert modification is emitted as an insert message with the new row, a delete modification is emitted as a delete message with the old row, and an update modification is emitted as a delete message with the old row and an insert message with the new row Current processing model is a subset of the dynamic table model. downstream operators or data sinks need to be able to correctly handle both types of messages
  27. only tables with unique keys can have update and delete modifications. the downstream operators need to be able to access previous values by key.
  28. val outputOpts = OutputOptions()  .firstResult(-15.minutes)    // first result 15 mins early  .completeResult(+5.minutes)  // complete result 5 mins late  .updateRate(3.minutes)       // result is updated every 3 mins  .lateUpdates(true)           // late result updates enabled  .lastResult(+15.minutes)     // last result 15 mins late -> until state is purged
  29. More TableSource and Sinks Generated intermediate data types, serializers, comparators Standalone SQL client Code generated aggregate functions