SlideShare a Scribd company logo
Data Ingest Self-Service and Management
using NiFi and Kafka
Imran Amjad, Principal Engineer
Dave Torok, Principal Architect
June 14, 2017
XFINITY TV
XFINITY Internet
XFINITY Voice
XFINITY Home
Digital & OtherOther
*Minority interest and/or non-controlling interest.
Slide is not comprehensive of all Comcast NBCUniversal assets
Updated: December 22, 2015
Introduction and Background
• Customer Experience UI with 30,000 unique internal users per month
• Ingesting about 2 Billion Events / Month
• Typical “Big Data Analytics” Pipeline
• Data ETL, land in a data lake (e.g. HBase)
• API / several channels of consumers / 14 million requests per day
• Grew from a few dozen to 150+ data sources / feeds in about a year
• Pipeline of 5-10 new data feeds per two week sprint
Data Ingestion Self-Service and Management using NiFi and Kafka3
High Level Architecture
Data Ingestion Self-Service and Management using NiFi and Kafka4
Streaming Compute Pipeline
UI
and
Other
Consumers
HTTP
Gateway
BATCH
Data
Sources
Analytics DB
Event Storage
DB
Rules
Enrich
Standardize
Detect
Aggregate
Real
Time
Data
Sources
“Pull”
Kafka
Bridge
(NiFi)
Store
REST API
Filestore
Apache
Flink
Apache
NiFi
Kafka
(Event
Bus)
NiFi
Kafka (Event Bus)
Streaming Compute Pipeline
UI
and
Other
Consumers
HTTP
Gateway
BATCH
Data
Sources
Analytics DB
Event Storage
DB
Rules
Enrich
Standardize
Detect
Aggregate
Real
Time
Data
Sources
“Pull”
Kafka
Bridge
(NiFi)
Store
REST API
Filestore
Apache
Flink
Apache
NiFi
Kafka
(Event
Bus)
NiFi
Kafka (Event Bus)
High Level Architecture
Data Ingestion Self-Service and Management using NiFi and Kafka5
Data Ingestion (“ETL”)
Analytics &
Active
Decisioning
UI / Services
Problem Statement and Motivation for Self Service
Time-to-market
VP: “I'd like to be able to add a new
event stream in 10 minutes.”
Data Ingestion Self-Service and Management using NiFi and Kafka6
Manual Processes
Code Deployment
Dimensions of Ingest Variability
Data Ingestion Self-Service and Management using NiFi and Kafka7
Transport Protocol
Kafka
Kinesis
HTTP/S
Files
(S)FTP
Format
JSON
XML
AVRO
CSV / Delimited
Custom
Timing
[Near] Real-Time
Streaming
Batch / Periodic
Ingest Control
Pull from Source
Push by Producer
Data Source Onboarding – Before Self-Service
Data Ingestion Self-Service and Management using NiFi and Kafka8
Data Source Onboarding – Before Self-Service
Data Ingestion Self-Service and Management using NiFi and Kafka9
Manual
Process
Code
Self-Service Architecture Principles
Data Ingestion Self-Service and Management using NiFi and Kafka10
Metadata
Driven
Data Ingestion,
Processing,
and Rendering
Driven by
Metadata
Automation
Orchestrated
Deployment for
New Data
Feeds
Rapid
Onboarding
Portal for Data
Source
Management
Light Data
Governance
Schema-
backed Data,
Schema
Registry
Monitoring and
Metrics
Ingestion, Data
Quality, and
Operational
Status
Streaming Compute Pipeline
UI
and
Other
Consumers
HTTP
Gateway
BATCH
Data
Sources
Analytics DB
Event Storage
DB
Rules
Enrich
Standardize
Detect
Aggregate
Real
Time
Data
Sources
“Pull”
Kafka
Bridge
(NiFi)
Store
REST API
Self-Serve
Metadata +
Content
Management
DB
Self
Service
API
Filestore
Apache
Flink
Apache
NiFi
Kafka
(Event
Bus)
NiFi
Self-Service
UI
Kafka (Event Bus)
High Level Architecture with Self-Service
Data Ingestion Self-Service and Management using NiFi and Kafka11
Data Ingestion (“ETL”)
Analytics &
Active
Decisioning
UI / Services
Self-Service
Streaming Compute Pipeline
UI
and
Other
Consumers
HTTP
Gateway
BATCH
Data
Sources
Analytics DB
Event Storage
DB
Rules
Enrich
Standardize
Detect
Aggregate
Real
Time
Data
Sources
“Pull”
Kafka
Bridge
(NiFi)
Store
REST API
Self-Serve
Metadata +
Content
Management
DB
Self
Service
API
Filestore
Apache
Flink
Apache
NiFi
Kafka
(Event
Bus)
NiFi
Self-Service
UI
Kafka (Event Bus)
Data Ingestion (“ETL”)
Analytics &
Active
Decisioning
UI / Services
Self-Service
High Level Architecture with Self-Service
Data Ingestion Self-Service and Management using NiFi and Kafka12
Metadata and Data Governance
Data Ingestion Self-Service and Management using NiFi and Kafka13
Models and Metadata to enable Self-Service
Self Service Metadata Management
CORE METADATA
Data Model and Data Dictionary
INGEST
And
ETL
Metadata
PROCESSING Metadata
Lookups, Enrichment,
Aggregation, Expressions
UI / RENDERING
METADATA
BUSINESS CONTENT
Enrichment and Notification
Templates and Lookups
Data Ingestion Self-Service and Management using NiFi and Kafka14
Metadata Management – Example “Tags” on individual data fields
Data Ingestion Self-Service and Management using NiFi and Kafka15
Display Field (true)
Category (“Product”)
Icon File (“x1_tv.svg”)
Icon Color (“green”)
UI Rendering
Sensitivity (none / encrypt)
Encryption KeyId (“SomeKeyId”)
Security
ShortName (“x1view”)
Description (“X1 View Program”)
Format (“json”)
Data Source
Information
Ingest Handling (Ingest /Drop)
Field Name (“viewcode”)
Field JsonPath
(“$.EVT.VALUE.CODE”)
Source Field
Information
Target Domain Object (“error”)
Field JsonPath
(“$.data.error.viewCode)
Target Field
Information
Why Data Governance?
Data Ingestion Self-Service and Management using NiFi and Kafka16
Validate Types against Schema
Detect Structure Changes
Backwards/Forwards Compatibility
Universally Required Information
Data Quality Data Rationalization
Standard Syntax and Semantics
Domains (“Customer”, “Device”)
Standard Field Names and Types
Message Data Format
Support Integration / Correlation
Data Curation
Metadata Management
Data Source Registry
Schema Repository
Support Lineage Traceability
Additional Properties (e.g. Security)
Lightweight Data Governance
Why JSON Schema over Avro for our use case?
• Data sources / producers generally aren’t using Avro
• Database Storage, UI, REST API is not Avro
• Tolerate data change: Detect, Accept, Notify, Correct (Later)
• Don’t “drop data on the floor” – AVRO ignores unknown fields
• [Some] AWS Services – JSON friendly but not Avro friendly
Data Ingestion Self-Service and Management using NiFi and Kafka17
AVRO Schema
vs.
JSON Schema
JSON + JSON Schema AVRO + AVRO Schema
Validation Validation
JSON Serialization Framework
Optimized for Data Storage
Upfront Data Governance
Lightweight Data Governance
Versioned Data and Schema
• Allow ingestion of multiple versions (particularly from different sources)
Data Quality
• Invalid Data Types
• Non-parseable Payloads
• Missing Required Fields
Data Change Tolerance
• Detect Additional Data / “Unknown Fields”
• Store alongside schema-modeled data
• Allow for display in UI only after updating Schema
Quality Feedback loop to data producers
Data Ingestion Self-Service and Management using NiFi and Kafka18
Lightweight Data Governance
Reduced schema review and approval process
Generate JSON Schema Artifacts from Field Metadata
Lightweight Metadata Repository
• Considering Apache Atlas and Hortonworks Schema Registry
Pre-defined Core Schema / Domain Objects
• 5-15 (or so) domain object schema to be included in an event schema
• E.g. Customer, Device, Geolocation
Data Ingestion Self-Service and Management using NiFi and Kafka19
Portal for Data Source Onboarding
Data Ingestion Self-Service and Management using NiFi and Kafka20
Data Source Onboarding UI
Data Ingestion Self-Service and Management using NiFi and Kafka21
Data Source Onboarding UI
Data Ingestion Self-Service and Management using NiFi and Kafka22
Data Source Onboarding UI
Data Ingestion Self-Service and Management using NiFi and Kafka23
Data Source Onboarding UI
Data Ingestion Self-Service and Management using NiFi and Kafka24
Data Source Onboarding UI
Data Ingestion Self-Service and Management using NiFi and Kafka25
Data Source Onboarding UI
Data Ingestion Self-Service and Management using NiFi and Kafka26
Data Source Onboarding UI
Data Ingestion Self-Service and Management using NiFi and Kafka27
Platform Automation for Kafka and NiFi
Data Ingestion Self-Service and Management using NiFi and Kafka28
Data Source Onboarding – With Self-Service
Data Ingestion Self-Service and Management using NiFi and Kafka29
Create Schema
and Metadata
Register the
Schema to
generate the
artifacts
Test & Validate
with Sample
data
Publish & Start
Ingesting Live
Data
Data
Producer
Generating NiFi Flows from Metadata
Data Ingestion Self-Service and Management using NiFi and Kafka30
Finding Data with JsonPath (Example from http://petstore.swagger.io/ )
Data Ingestion Self-Service and Management using NiFi and Kafka31
{
"id": 0,
"category": {
"id": 0,
"name": “dogs" },
"name": “Fido",
"photoUrls": [ “http://myimage.com" ],
"tags": [
{ "id": 0, "name": “friendly" },
{“id”:1, “name” : “housebroken”} ],
"status": "available"
}
$.id = 0
$.category.name = “dogs”
$.tags[?(@.id == 0)] =
{id = 0, “name =“friendly”}
Example Source Data Payload
{
"PV": "1.1",
"APP": {
"APP_NAME": "XRE",
"APP_VER": "X1"
},
"DEV": {
"DEVICE_TYPE": "Xi3",
},
"ACNT": {
"BILL_ID": " 1234321234 "
},
"LOC": {
"UTC_OFF": "-05"
},
"EVT": {
"ETS": 1487880967000,
"NAME": "program",
"VALUE": {
"TYPE": "XRE",
"CODE": "XRE-12345",
"DESCRIPTION": "Customer started a
program"
}
},
"PTS": 1468876861829
}
Data Ingestion Self-Service and Management using NiFi and Kafka32
$.ACNT.BILL_ID
$.LOC.UTC_OFF
$.EVT.ETS
Example Target Schema Payload
{
"schema" : "comcast/xreinfo/jsonschema/1-0-0",
"billingId":"1234321234",
"eventSource":"xre",
"eventType":"xreinfo",
"timestamp":1487880967000,
"data" : {
"device" : {
"deviceType" : "Xi3"
},
"customerAccount" : {
"billingId" : "1234321234",
"accountType" : "residential"
},
“timeInformation" : {
“utcOffset" : “-5"
},
"info" : {
"infoCode" : "XRE-12345"
}
},
"unknownData" : {
"EVT" : {
“INFO": {
"ABXXY" : “MORE DATA"
}
}
}
}
Data Ingestion Self-Service and Management using NiFi and Kafka33
Simple Jsonpath Transformation
FROM $.ACNT.BILL_ID
TO $.billingId
TO $.data.customerAccount.billingId)
FROM $.EVT.ETS
TO $.data.timestamp
FROM $.LOC.UTC_OFF
TO $.data.timeInformation.utcOffset
Data Ingestion Self-Service and Management using NiFi and Kafka34
{
"schema" : "comcast/xreinfo/jsonschema/1-0-0",
"billingId":"1234321234",
"eventSource":"xre",
"eventType":"xreinfo",
"timestamp":1487880967000,
"data" : {
"device" : {
"deviceType" : “Xi3"
},
"customerAccount" : {
"billingId" : "1234321234"
},
“timeInformation" : {
“utcOffset" : “-5"
},
"info" : {
"infoCode" : "XRE-12345"
}
},
Advanced JSON Transformation using JOLT
Rich DSL for transforming and manipulating JSON
• Transform Structure (“Shift”)
• Defaults
• Sort
• Remove
• Expressions
https://github.com/bazaarvoice/jolt
Apache 2.0 License
NiFi “JoltTransformJSON” Processor
Data Ingestion Self-Service and Management using NiFi and Kafka35
JOLT Transformation for X1 Event
Data Ingestion Self-Service and Management using NiFi and Kafka36
Advanced Transform Examples (“JOLT” Processor)
INPUT: {"ChangedFields":["JobStateCd","DispatcherStatusCd"]}
OUTPUT:{“ChangedFields_0”:“JobStateCd”,
“ChangedFields_1”:“DispatcherStatusCd”,
“ChangedReason”, “JobStateCd|DispatcherStatusCd||||” }
Data Ingestion Self-Service and Management using NiFi and Kafka37
Advanced Transform Examples (“JOLT” Processor)
INPUT: {"ChangedFields":["JobStateCd","DispatcherStatusCd"]}
"spec": { "ChangedFields_0": "=elementAt(@(1,ChangedField),0)",
"ChangedFields_1": "=elementAt(@(1,ChangedField),1)",
"ChangedFields_2": "=elementAt(@(1,ChangedField),2)",
"ChangedFields_3": "=elementAt(@(1,ChangedField),3)",
"ChangedFields_4": "=elementAt(@(1,ChangedField),4)",
"ChangedFields_5": "=elementAt(@(1,ChangedField),5)" } },
"spec": { "ChangedReason":
"=concat(@(1,ChangedFields_0),'|',@(1,ChangedFields_1),'|',@(1,ChangedFields
_2),'|',@(1,ChangedFields_3),'|',@(1,ChangedFields_4),'|',@(1,ChangedFields_
5))" }
OUTPUT:{“ChangedFields_0”:“JobStateCd”,
“ChangedFields_1”:“DispatcherStatusCd”,
“ChangedReason”, “JobStateCd|DispatcherStatusCd||||” }
Data Ingestion Self-Service and Management using NiFi and Kafka38
Advanced Transform Examples (“JOLT” Processor)
INPUT: {"ChangedFields":["JobStateCd","DispatcherStatusCd"]}
"spec": { "ChangedFields_0": "=elementAt(@(1,ChangedField),0)",
"ChangedFields_1": "=elementAt(@(1,ChangedField),1)",
"ChangedFields_2": "=elementAt(@(1,ChangedField),2)",
"ChangedFields_3": "=elementAt(@(1,ChangedField),3)",
"ChangedFields_4": "=elementAt(@(1,ChangedField),4)",
"ChangedFields_5": "=elementAt(@(1,ChangedField),5)" } },
"spec": { "ChangedReason":
"=concat(@(1,ChangedFields_0),'|',@(1,ChangedFields_1),'|',@(1,ChangedFields
_2),'|',@(1,ChangedFields_3),'|',@(1,ChangedFields_4),'|',@(1,ChangedFields_
5))" }
OUTPUT:{“ChangedFields_0”:“JobStateCd”,
“ChangedFields_1”:“DispatcherStatusCd”,
“ChangedReason”, “JobStateCd|DispatcherStatusCd||||” }
Data Ingestion Self-Service and Management using NiFi and Kafka39
Advanced Transform Examples (“JOLT” Processor)
“Devices” : {
“array” : [
{“fulfillmentID”: “12345”,
“Device”: {“schema.core.Device": {
“model” : { “string”: “AX013ANM” }}}},
{“fulfillmentID”: “23456”,
“Device”: {“schema": {
“model” : { “string” :“PXD01ANI” }}}}]}
"Devices": {
"array": {
"*": { "fulfillmentID": { "string": "events[0].data.fid_&2"},
"device": { “schema.core.Device": { “model": { "string": "events[0].data.model_&4"
“events” : [ {
“data” : {
“fid_0" : “12345",
"model_0" : "AX013ANM ",
"fid_1" : “23456",
“model_1" : "PXD01ANI” } } ]
Data Ingestion Self-Service and Management using NiFi and Kafka40
NiFi REST API – Create a Process Group
Data Ingestion Self-Service and Management using NiFi and Kafka41
POST http://localhost:8080/nifi-api/process-groups/root/process-groups
{
"revision": {
"version" : 0
},
"component": {
"name" : "x1info Process Group"
}
}
NiFi REST API – Create a ConsumeKafka Processor
POST http://localhost:8080/nifi-api/process-groups/{ID}/processors
{
"revision": {
"version": 0
},
"component": {
"config": {
"properties": {
"bootstrap.servers": "localhost:9092",
"topic": "raw.mystream.x1info",
"group.id": "nifi-stage-0522",
"auto.offset.reset": "latest“
}},
"name": "ConsumeKafka - x1info",
"type": "org.apache.nifi.processors.kafka.pubsub.ConsumeKafka“
}}
Data Ingestion Self-Service and Management using NiFi and Kafka42
Generated NiFi Flow
Data Ingestion Self-Service and Management using NiFi and Kafka43
Monitoring and Metrics
• Dashboard of Data Quality
• Ingestion Rate Monitoring (and possibly alerting with anomaly detection)
• Alerting to producers (Data Quality)
Data Ingestion Self-Service and Management using NiFi and Kafka44
Ingestion Status Dashboard
Data Ingestion Self-Service and Management using NiFi and Kafka45
Data Center 1 Data Center 2
Event_type1
Event_type2
MyEvent
WeekendEvent
Anomaly Detection
Data Ingestion Self-Service and Management using NiFi and Kafka46
Self-Service Lessons Learned
Design with automation and configuration in mind
Metadata-driven design reduces code deployments and custom solutions
Make Simple things Simple – But allow hard things to be “code” and not UI-driven
Let Data Producers be accountable for Data Quality
NiFi and JOLT = Powerful Toolkit
Data Ingestion Self-Service and Management using NiFi and Kafka47
Thank You!
Data Ingest Self Service and Management using Nifi and Kafka

More Related Content

What's hot

Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
DataWorks Summit
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
HostedbyConfluent
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
GetInData
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
Flink Forward
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
Databricks
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
Alex Van Boxel
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
DataWorks Summit/Hadoop Summit
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
Julien Le Dem
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
Sub Szabolcs Feczak
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
BYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFiBYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFi
DataWorks Summit
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
DataWorks Summit/Hadoop Summit
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
HostedbyConfluent
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
Alluxio, Inc.
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
DataWorks Summit/Hadoop Summit
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
Flink Forward
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
Alluxio, Inc.
 

What's hot (20)

Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Apache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - finalApache Beam and Google Cloud Dataflow - IDG - final
Apache Beam and Google Cloud Dataflow - IDG - final
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
BYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFiBYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFi
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache ...
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 

Similar to Data Ingest Self Service and Management using Nifi and Kafka

xGem Data Stream Processing
xGem Data Stream ProcessingxGem Data Stream Processing
xGem Data Stream Processing
Jorge Hirtz
 
Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014
Matthew Vaughn
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
Guido Schmutz
 
Workshop splunk 6.5-saint-louis-mo
Workshop splunk 6.5-saint-louis-moWorkshop splunk 6.5-saint-louis-mo
Workshop splunk 6.5-saint-louis-mo
Mohamad Hassan
 
Developing for Astoria: ADO.NET Data Services
Developing for Astoria: ADO.NET Data ServicesDeveloping for Astoria: ADO.NET Data Services
Developing for Astoria: ADO.NET Data Services
Harish Ranganathan
 
Cert05 70-487 - developing microsoft azure and web services
Cert05   70-487 - developing microsoft azure and web servicesCert05   70-487 - developing microsoft azure and web services
Cert05 70-487 - developing microsoft azure and web services
DotNetCampus
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Amazon Web Services
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and Future
Gyula Fóra
 
Web Application Frameworks (WAF)
Web Application Frameworks (WAF)Web Application Frameworks (WAF)
Web Application Frameworks (WAF)
Ako Kaman
 
Going FaaSter, Functions as a Service at Netflix
Going FaaSter, Functions as a Service at NetflixGoing FaaSter, Functions as a Service at Netflix
Going FaaSter, Functions as a Service at Netflix
Yunong Xiao
 
What's New in IBM Streams V4.1
What's New in IBM Streams V4.1What's New in IBM Streams V4.1
What's New in IBM Streams V4.1
lisanl
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and Druid
DataWorks Summit
 
Confluent and Elastic
Confluent and ElasticConfluent and Elastic
Confluent and Elastic
Paolo Castagna
 
M meijer api management - tech-days 2015
M meijer   api management - tech-days 2015M meijer   api management - tech-days 2015
M meijer api management - tech-days 2015
Freelance Consultant / Manager / co-CTO
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelines
Swami Sundaramurthy
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
confluent
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
Nitin Kumar
 
AI made easy with Flink AI Flow
AI made easy with Flink AI FlowAI made easy with Flink AI Flow
AI made easy with Flink AI Flow
Jiangjie Qin
 
Toyko azure meetup # 1 azure paa s overview
Toyko azure meetup # 1   azure paa s overviewToyko azure meetup # 1   azure paa s overview
Toyko azure meetup # 1 azure paa s overview
Tokyo Azure Meetup
 

Similar to Data Ingest Self Service and Management using Nifi and Kafka (20)

xGem Data Stream Processing
xGem Data Stream ProcessingxGem Data Stream Processing
xGem Data Stream Processing
 
Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Workshop splunk 6.5-saint-louis-mo
Workshop splunk 6.5-saint-louis-moWorkshop splunk 6.5-saint-louis-mo
Workshop splunk 6.5-saint-louis-mo
 
Developing for Astoria: ADO.NET Data Services
Developing for Astoria: ADO.NET Data ServicesDeveloping for Astoria: ADO.NET Data Services
Developing for Astoria: ADO.NET Data Services
 
Cert05 70-487 - developing microsoft azure and web services
Cert05   70-487 - developing microsoft azure and web servicesCert05   70-487 - developing microsoft azure and web services
Cert05 70-487 - developing microsoft azure and web services
 
The Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and Future
 
Web Application Frameworks (WAF)
Web Application Frameworks (WAF)Web Application Frameworks (WAF)
Web Application Frameworks (WAF)
 
Going FaaSter, Functions as a Service at Netflix
Going FaaSter, Functions as a Service at NetflixGoing FaaSter, Functions as a Service at Netflix
Going FaaSter, Functions as a Service at Netflix
 
What's New in IBM Streams V4.1
What's New in IBM Streams V4.1What's New in IBM Streams V4.1
What's New in IBM Streams V4.1
 
Lightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and DruidLightning Fast Analytics with Hive LLAP and Druid
Lightning Fast Analytics with Hive LLAP and Druid
 
Confluent and Elastic
Confluent and ElasticConfluent and Elastic
Confluent and Elastic
 
M meijer api management - tech-days 2015
M meijer   api management - tech-days 2015M meijer   api management - tech-days 2015
M meijer api management - tech-days 2015
 
Genji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelinesGenji: Framework for building resilient near-realtime data pipelines
Genji: Framework for building resilient near-realtime data pipelines
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017Confluent kafka meetupseattle jan2017
Confluent kafka meetupseattle jan2017
 
AI made easy with Flink AI Flow
AI made easy with Flink AI FlowAI made easy with Flink AI Flow
AI made easy with Flink AI Flow
 
Toyko azure meetup # 1 azure paa s overview
Toyko azure meetup # 1   azure paa s overviewToyko azure meetup # 1   azure paa s overview
Toyko azure meetup # 1 azure paa s overview
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
DianaGray10
 
Keynote : Presentation on SASE Technology
Keynote : Presentation on SASE TechnologyKeynote : Presentation on SASE Technology
Keynote : Presentation on SASE Technology
Priyanka Aash
 
History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )
Badri_Bady
 
Camunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptxCamunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptx
ZachWylie3
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
Priyanka Aash
 
Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Cracking AI Black Box - Strategies for Customer-centric Enterprise ExcellenceCracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Quentin Reul
 
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptxFIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Alliance
 
What's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptxWhat's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptx
Stephanie Beckett
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
Zilliz
 
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdfDefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
Yury Chemerkin
 
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptxFIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Alliance
 
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
OnBoard
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Snarky Security
 
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
Fwdays
 
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptxFIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Alliance
 
The Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - CoatueThe Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - Coatue
Razin Mustafiz
 
Demystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity ApplicationsDemystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity Applications
Priyanka Aash
 
UiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPathCommunity
 
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan..."Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
Fwdays
 
Keynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive SecurityKeynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive Security
Priyanka Aash
 

Recently uploaded (20)

How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...How UiPath Discovery Suite supports identification of Agentic Process Automat...
How UiPath Discovery Suite supports identification of Agentic Process Automat...
 
Keynote : Presentation on SASE Technology
Keynote : Presentation on SASE TechnologyKeynote : Presentation on SASE Technology
Keynote : Presentation on SASE Technology
 
History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )History and Introduction for Generative AI ( GenAI )
History and Introduction for Generative AI ( GenAI )
 
Camunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptxCamunda Chapter NY Meetup July 2024.pptx
Camunda Chapter NY Meetup July 2024.pptx
 
Finetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and DefendingFinetuning GenAI For Hacking and Defending
Finetuning GenAI For Hacking and Defending
 
Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Cracking AI Black Box - Strategies for Customer-centric Enterprise ExcellenceCracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
Cracking AI Black Box - Strategies for Customer-centric Enterprise Excellence
 
FIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptxFIDO Munich Seminar: Securing Smart Car.pptx
FIDO Munich Seminar: Securing Smart Car.pptx
 
What's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptxWhat's New in Copilot for Microsoft 365 June 2024.pptx
What's New in Copilot for Microsoft 365 June 2024.pptx
 
The History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal EmbeddingsThe History of Embeddings & Multimodal Embeddings
The History of Embeddings & Multimodal Embeddings
 
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdfDefCamp_2016_Chemerkin_Yury_--_publish.pdf
DefCamp_2016_Chemerkin_Yury_--_publish.pdf
 
FIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptxFIDO Munich Seminar FIDO Automotive Apps.pptx
FIDO Munich Seminar FIDO Automotive Apps.pptx
 
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
Mastering Board Best Practices: Essential Skills for Effective Non-profit Lea...
 
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
Welcome to Cyberbiosecurity. Because regular cybersecurity wasn't complicated...
 
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx"Making .NET Application Even Faster", Sergey Teplyakov.pptx
"Making .NET Application Even Faster", Sergey Teplyakov.pptx
 
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptxFIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
FIDO Munich Seminar Blueprint for In-Vehicle Payment Standard.pptx
 
The Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - CoatueThe Path to General-Purpose Robots - Coatue
The Path to General-Purpose Robots - Coatue
 
Demystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity ApplicationsDemystifying Neural Networks And Building Cybersecurity Applications
Demystifying Neural Networks And Building Cybersecurity Applications
 
UiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPath Community Day Amsterdam: Code, Collaborate, ConnectUiPath Community Day Amsterdam: Code, Collaborate, Connect
UiPath Community Day Amsterdam: Code, Collaborate, Connect
 
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan..."Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
"Building Future-Ready Apps with .NET 8 and Azure Serverless Ecosystem", Stan...
 
Keynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive SecurityKeynote : AI & Future Of Offensive Security
Keynote : AI & Future Of Offensive Security
 

Data Ingest Self Service and Management using Nifi and Kafka

  • 1. Data Ingest Self-Service and Management using NiFi and Kafka Imran Amjad, Principal Engineer Dave Torok, Principal Architect June 14, 2017
  • 2. XFINITY TV XFINITY Internet XFINITY Voice XFINITY Home Digital & OtherOther *Minority interest and/or non-controlling interest. Slide is not comprehensive of all Comcast NBCUniversal assets Updated: December 22, 2015
  • 3. Introduction and Background • Customer Experience UI with 30,000 unique internal users per month • Ingesting about 2 Billion Events / Month • Typical “Big Data Analytics” Pipeline • Data ETL, land in a data lake (e.g. HBase) • API / several channels of consumers / 14 million requests per day • Grew from a few dozen to 150+ data sources / feeds in about a year • Pipeline of 5-10 new data feeds per two week sprint Data Ingestion Self-Service and Management using NiFi and Kafka3
  • 4. High Level Architecture Data Ingestion Self-Service and Management using NiFi and Kafka4 Streaming Compute Pipeline UI and Other Consumers HTTP Gateway BATCH Data Sources Analytics DB Event Storage DB Rules Enrich Standardize Detect Aggregate Real Time Data Sources “Pull” Kafka Bridge (NiFi) Store REST API Filestore Apache Flink Apache NiFi Kafka (Event Bus) NiFi Kafka (Event Bus)
  • 5. Streaming Compute Pipeline UI and Other Consumers HTTP Gateway BATCH Data Sources Analytics DB Event Storage DB Rules Enrich Standardize Detect Aggregate Real Time Data Sources “Pull” Kafka Bridge (NiFi) Store REST API Filestore Apache Flink Apache NiFi Kafka (Event Bus) NiFi Kafka (Event Bus) High Level Architecture Data Ingestion Self-Service and Management using NiFi and Kafka5 Data Ingestion (“ETL”) Analytics & Active Decisioning UI / Services
  • 6. Problem Statement and Motivation for Self Service Time-to-market VP: “I'd like to be able to add a new event stream in 10 minutes.” Data Ingestion Self-Service and Management using NiFi and Kafka6 Manual Processes Code Deployment
  • 7. Dimensions of Ingest Variability Data Ingestion Self-Service and Management using NiFi and Kafka7 Transport Protocol Kafka Kinesis HTTP/S Files (S)FTP Format JSON XML AVRO CSV / Delimited Custom Timing [Near] Real-Time Streaming Batch / Periodic Ingest Control Pull from Source Push by Producer
  • 8. Data Source Onboarding – Before Self-Service Data Ingestion Self-Service and Management using NiFi and Kafka8
  • 9. Data Source Onboarding – Before Self-Service Data Ingestion Self-Service and Management using NiFi and Kafka9 Manual Process Code
  • 10. Self-Service Architecture Principles Data Ingestion Self-Service and Management using NiFi and Kafka10 Metadata Driven Data Ingestion, Processing, and Rendering Driven by Metadata Automation Orchestrated Deployment for New Data Feeds Rapid Onboarding Portal for Data Source Management Light Data Governance Schema- backed Data, Schema Registry Monitoring and Metrics Ingestion, Data Quality, and Operational Status
  • 11. Streaming Compute Pipeline UI and Other Consumers HTTP Gateway BATCH Data Sources Analytics DB Event Storage DB Rules Enrich Standardize Detect Aggregate Real Time Data Sources “Pull” Kafka Bridge (NiFi) Store REST API Self-Serve Metadata + Content Management DB Self Service API Filestore Apache Flink Apache NiFi Kafka (Event Bus) NiFi Self-Service UI Kafka (Event Bus) High Level Architecture with Self-Service Data Ingestion Self-Service and Management using NiFi and Kafka11 Data Ingestion (“ETL”) Analytics & Active Decisioning UI / Services Self-Service
  • 12. Streaming Compute Pipeline UI and Other Consumers HTTP Gateway BATCH Data Sources Analytics DB Event Storage DB Rules Enrich Standardize Detect Aggregate Real Time Data Sources “Pull” Kafka Bridge (NiFi) Store REST API Self-Serve Metadata + Content Management DB Self Service API Filestore Apache Flink Apache NiFi Kafka (Event Bus) NiFi Self-Service UI Kafka (Event Bus) Data Ingestion (“ETL”) Analytics & Active Decisioning UI / Services Self-Service High Level Architecture with Self-Service Data Ingestion Self-Service and Management using NiFi and Kafka12
  • 13. Metadata and Data Governance Data Ingestion Self-Service and Management using NiFi and Kafka13
  • 14. Models and Metadata to enable Self-Service Self Service Metadata Management CORE METADATA Data Model and Data Dictionary INGEST And ETL Metadata PROCESSING Metadata Lookups, Enrichment, Aggregation, Expressions UI / RENDERING METADATA BUSINESS CONTENT Enrichment and Notification Templates and Lookups Data Ingestion Self-Service and Management using NiFi and Kafka14
  • 15. Metadata Management – Example “Tags” on individual data fields Data Ingestion Self-Service and Management using NiFi and Kafka15 Display Field (true) Category (“Product”) Icon File (“x1_tv.svg”) Icon Color (“green”) UI Rendering Sensitivity (none / encrypt) Encryption KeyId (“SomeKeyId”) Security ShortName (“x1view”) Description (“X1 View Program”) Format (“json”) Data Source Information Ingest Handling (Ingest /Drop) Field Name (“viewcode”) Field JsonPath (“$.EVT.VALUE.CODE”) Source Field Information Target Domain Object (“error”) Field JsonPath (“$.data.error.viewCode) Target Field Information
  • 16. Why Data Governance? Data Ingestion Self-Service and Management using NiFi and Kafka16 Validate Types against Schema Detect Structure Changes Backwards/Forwards Compatibility Universally Required Information Data Quality Data Rationalization Standard Syntax and Semantics Domains (“Customer”, “Device”) Standard Field Names and Types Message Data Format Support Integration / Correlation Data Curation Metadata Management Data Source Registry Schema Repository Support Lineage Traceability Additional Properties (e.g. Security)
  • 17. Lightweight Data Governance Why JSON Schema over Avro for our use case? • Data sources / producers generally aren’t using Avro • Database Storage, UI, REST API is not Avro • Tolerate data change: Detect, Accept, Notify, Correct (Later) • Don’t “drop data on the floor” – AVRO ignores unknown fields • [Some] AWS Services – JSON friendly but not Avro friendly Data Ingestion Self-Service and Management using NiFi and Kafka17 AVRO Schema vs. JSON Schema JSON + JSON Schema AVRO + AVRO Schema Validation Validation JSON Serialization Framework Optimized for Data Storage Upfront Data Governance
  • 18. Lightweight Data Governance Versioned Data and Schema • Allow ingestion of multiple versions (particularly from different sources) Data Quality • Invalid Data Types • Non-parseable Payloads • Missing Required Fields Data Change Tolerance • Detect Additional Data / “Unknown Fields” • Store alongside schema-modeled data • Allow for display in UI only after updating Schema Quality Feedback loop to data producers Data Ingestion Self-Service and Management using NiFi and Kafka18
  • 19. Lightweight Data Governance Reduced schema review and approval process Generate JSON Schema Artifacts from Field Metadata Lightweight Metadata Repository • Considering Apache Atlas and Hortonworks Schema Registry Pre-defined Core Schema / Domain Objects • 5-15 (or so) domain object schema to be included in an event schema • E.g. Customer, Device, Geolocation Data Ingestion Self-Service and Management using NiFi and Kafka19
  • 20. Portal for Data Source Onboarding Data Ingestion Self-Service and Management using NiFi and Kafka20
  • 21. Data Source Onboarding UI Data Ingestion Self-Service and Management using NiFi and Kafka21
  • 22. Data Source Onboarding UI Data Ingestion Self-Service and Management using NiFi and Kafka22
  • 23. Data Source Onboarding UI Data Ingestion Self-Service and Management using NiFi and Kafka23
  • 24. Data Source Onboarding UI Data Ingestion Self-Service and Management using NiFi and Kafka24
  • 25. Data Source Onboarding UI Data Ingestion Self-Service and Management using NiFi and Kafka25
  • 26. Data Source Onboarding UI Data Ingestion Self-Service and Management using NiFi and Kafka26
  • 27. Data Source Onboarding UI Data Ingestion Self-Service and Management using NiFi and Kafka27
  • 28. Platform Automation for Kafka and NiFi Data Ingestion Self-Service and Management using NiFi and Kafka28
  • 29. Data Source Onboarding – With Self-Service Data Ingestion Self-Service and Management using NiFi and Kafka29 Create Schema and Metadata Register the Schema to generate the artifacts Test & Validate with Sample data Publish & Start Ingesting Live Data Data Producer
  • 30. Generating NiFi Flows from Metadata Data Ingestion Self-Service and Management using NiFi and Kafka30
  • 31. Finding Data with JsonPath (Example from http://petstore.swagger.io/ ) Data Ingestion Self-Service and Management using NiFi and Kafka31 { "id": 0, "category": { "id": 0, "name": “dogs" }, "name": “Fido", "photoUrls": [ “http://myimage.com" ], "tags": [ { "id": 0, "name": “friendly" }, {“id”:1, “name” : “housebroken”} ], "status": "available" } $.id = 0 $.category.name = “dogs” $.tags[?(@.id == 0)] = {id = 0, “name =“friendly”}
  • 32. Example Source Data Payload { "PV": "1.1", "APP": { "APP_NAME": "XRE", "APP_VER": "X1" }, "DEV": { "DEVICE_TYPE": "Xi3", }, "ACNT": { "BILL_ID": " 1234321234 " }, "LOC": { "UTC_OFF": "-05" }, "EVT": { "ETS": 1487880967000, "NAME": "program", "VALUE": { "TYPE": "XRE", "CODE": "XRE-12345", "DESCRIPTION": "Customer started a program" } }, "PTS": 1468876861829 } Data Ingestion Self-Service and Management using NiFi and Kafka32 $.ACNT.BILL_ID $.LOC.UTC_OFF $.EVT.ETS
  • 33. Example Target Schema Payload { "schema" : "comcast/xreinfo/jsonschema/1-0-0", "billingId":"1234321234", "eventSource":"xre", "eventType":"xreinfo", "timestamp":1487880967000, "data" : { "device" : { "deviceType" : "Xi3" }, "customerAccount" : { "billingId" : "1234321234", "accountType" : "residential" }, “timeInformation" : { “utcOffset" : “-5" }, "info" : { "infoCode" : "XRE-12345" } }, "unknownData" : { "EVT" : { “INFO": { "ABXXY" : “MORE DATA" } } } } Data Ingestion Self-Service and Management using NiFi and Kafka33
  • 34. Simple Jsonpath Transformation FROM $.ACNT.BILL_ID TO $.billingId TO $.data.customerAccount.billingId) FROM $.EVT.ETS TO $.data.timestamp FROM $.LOC.UTC_OFF TO $.data.timeInformation.utcOffset Data Ingestion Self-Service and Management using NiFi and Kafka34 { "schema" : "comcast/xreinfo/jsonschema/1-0-0", "billingId":"1234321234", "eventSource":"xre", "eventType":"xreinfo", "timestamp":1487880967000, "data" : { "device" : { "deviceType" : “Xi3" }, "customerAccount" : { "billingId" : "1234321234" }, “timeInformation" : { “utcOffset" : “-5" }, "info" : { "infoCode" : "XRE-12345" } },
  • 35. Advanced JSON Transformation using JOLT Rich DSL for transforming and manipulating JSON • Transform Structure (“Shift”) • Defaults • Sort • Remove • Expressions https://github.com/bazaarvoice/jolt Apache 2.0 License NiFi “JoltTransformJSON” Processor Data Ingestion Self-Service and Management using NiFi and Kafka35
  • 36. JOLT Transformation for X1 Event Data Ingestion Self-Service and Management using NiFi and Kafka36
  • 37. Advanced Transform Examples (“JOLT” Processor) INPUT: {"ChangedFields":["JobStateCd","DispatcherStatusCd"]} OUTPUT:{“ChangedFields_0”:“JobStateCd”, “ChangedFields_1”:“DispatcherStatusCd”, “ChangedReason”, “JobStateCd|DispatcherStatusCd||||” } Data Ingestion Self-Service and Management using NiFi and Kafka37
  • 38. Advanced Transform Examples (“JOLT” Processor) INPUT: {"ChangedFields":["JobStateCd","DispatcherStatusCd"]} "spec": { "ChangedFields_0": "=elementAt(@(1,ChangedField),0)", "ChangedFields_1": "=elementAt(@(1,ChangedField),1)", "ChangedFields_2": "=elementAt(@(1,ChangedField),2)", "ChangedFields_3": "=elementAt(@(1,ChangedField),3)", "ChangedFields_4": "=elementAt(@(1,ChangedField),4)", "ChangedFields_5": "=elementAt(@(1,ChangedField),5)" } }, "spec": { "ChangedReason": "=concat(@(1,ChangedFields_0),'|',@(1,ChangedFields_1),'|',@(1,ChangedFields _2),'|',@(1,ChangedFields_3),'|',@(1,ChangedFields_4),'|',@(1,ChangedFields_ 5))" } OUTPUT:{“ChangedFields_0”:“JobStateCd”, “ChangedFields_1”:“DispatcherStatusCd”, “ChangedReason”, “JobStateCd|DispatcherStatusCd||||” } Data Ingestion Self-Service and Management using NiFi and Kafka38
  • 39. Advanced Transform Examples (“JOLT” Processor) INPUT: {"ChangedFields":["JobStateCd","DispatcherStatusCd"]} "spec": { "ChangedFields_0": "=elementAt(@(1,ChangedField),0)", "ChangedFields_1": "=elementAt(@(1,ChangedField),1)", "ChangedFields_2": "=elementAt(@(1,ChangedField),2)", "ChangedFields_3": "=elementAt(@(1,ChangedField),3)", "ChangedFields_4": "=elementAt(@(1,ChangedField),4)", "ChangedFields_5": "=elementAt(@(1,ChangedField),5)" } }, "spec": { "ChangedReason": "=concat(@(1,ChangedFields_0),'|',@(1,ChangedFields_1),'|',@(1,ChangedFields _2),'|',@(1,ChangedFields_3),'|',@(1,ChangedFields_4),'|',@(1,ChangedFields_ 5))" } OUTPUT:{“ChangedFields_0”:“JobStateCd”, “ChangedFields_1”:“DispatcherStatusCd”, “ChangedReason”, “JobStateCd|DispatcherStatusCd||||” } Data Ingestion Self-Service and Management using NiFi and Kafka39
  • 40. Advanced Transform Examples (“JOLT” Processor) “Devices” : { “array” : [ {“fulfillmentID”: “12345”, “Device”: {“schema.core.Device": { “model” : { “string”: “AX013ANM” }}}}, {“fulfillmentID”: “23456”, “Device”: {“schema": { “model” : { “string” :“PXD01ANI” }}}}]} "Devices": { "array": { "*": { "fulfillmentID": { "string": "events[0].data.fid_&2"}, "device": { “schema.core.Device": { “model": { "string": "events[0].data.model_&4" “events” : [ { “data” : { “fid_0" : “12345", "model_0" : "AX013ANM ", "fid_1" : “23456", “model_1" : "PXD01ANI” } } ] Data Ingestion Self-Service and Management using NiFi and Kafka40
  • 41. NiFi REST API – Create a Process Group Data Ingestion Self-Service and Management using NiFi and Kafka41 POST http://localhost:8080/nifi-api/process-groups/root/process-groups { "revision": { "version" : 0 }, "component": { "name" : "x1info Process Group" } }
  • 42. NiFi REST API – Create a ConsumeKafka Processor POST http://localhost:8080/nifi-api/process-groups/{ID}/processors { "revision": { "version": 0 }, "component": { "config": { "properties": { "bootstrap.servers": "localhost:9092", "topic": "raw.mystream.x1info", "group.id": "nifi-stage-0522", "auto.offset.reset": "latest“ }}, "name": "ConsumeKafka - x1info", "type": "org.apache.nifi.processors.kafka.pubsub.ConsumeKafka“ }} Data Ingestion Self-Service and Management using NiFi and Kafka42
  • 43. Generated NiFi Flow Data Ingestion Self-Service and Management using NiFi and Kafka43
  • 44. Monitoring and Metrics • Dashboard of Data Quality • Ingestion Rate Monitoring (and possibly alerting with anomaly detection) • Alerting to producers (Data Quality) Data Ingestion Self-Service and Management using NiFi and Kafka44
  • 45. Ingestion Status Dashboard Data Ingestion Self-Service and Management using NiFi and Kafka45 Data Center 1 Data Center 2 Event_type1 Event_type2 MyEvent WeekendEvent
  • 46. Anomaly Detection Data Ingestion Self-Service and Management using NiFi and Kafka46
  • 47. Self-Service Lessons Learned Design with automation and configuration in mind Metadata-driven design reduces code deployments and custom solutions Make Simple things Simple – But allow hard things to be “code” and not UI-driven Let Data Producers be accountable for Data Quality NiFi and JOLT = Powerful Toolkit Data Ingestion Self-Service and Management using NiFi and Kafka47