SlideShare a Scribd company logo
Jim Peregord, Venu Palvai
Element Fleet Management
Building a Pluggable Analytics Stack with Cassandra as the Foundation
1 Background on Element Fleet Management
2 Key Use Cases Supported
3 Architecture
4 Our Journey
5 Lessons Learned
2© DataStax, All Rights Reserved.
A Little About Us
© DataStax, All Rights Reserved. 3
Jim Peregord Venu Palvai
VP – Analytics, BI, Data Mgt
Lead Architect
Background on Element Fleet Management
© DataStax, All Rights Reserved. 4
Full lifecycle of fleet
management services
Data consolidation and
advanced analytics
Maximize customer
ROI on fleet assets via
data and advanced
2,600 employees
1+ million vehicles
$18 billion in total
finance assets
2 billion rows of data
and growing
Greenfield Opportunity to Build Analytics Platform
• Element acquired GE Fleet Management September 1, 2015
• Now the largest publicly held Fleet Management company in world
• Pre-acquisition Element had limited data warehouse and Big Data tech
• Greenfield Opportunity to build next gen BI and Advanced Analytics platform
High-level Options Considered
#1 – Build a separate data warehouse and Big Data/Advanced Analytics platform
#2 – Build a single, unified architecture that supports both
Our Decision
#2 – Build a single, unified platform using DataStax
© DataStax, All Rights Reserved. 5
Key Use Cases Supported on New Platform
• High availability out of the box
• Linear and elastic scalability
• High concurrency and low latency
• Real-time ingestion of data streams: Vehicle (location, diagnostics), weather, traffic
• Expose data and analytics via RESTful APIs
• Advanced Analytics (Predictive, Prescriptive, Streaming)
• Data warehouse and traditional reporting
© DataStax, All Rights Reserved. 6
Advanced Analytics Hardware Architecture
• Purpose-Built Hardware for Advanced Analytics
• NUMA/NVME Hardware is not commodity – it is highly specialized for very
high performance. Tens of millions of IOPs.
• Architected to scale 10x or even 100x current capacity – A must for
Telematics and IOT data.
• H/W Specs – 256GB, 4 X 2 TB SSD, dedicated C*/Spark instance per SSD
• Active-Active clustering means very high availability
• C* / Spark / SOLR / FiloDB / DSE Graph + NUMA – High performance
analytics platform
© DataStax, All Rights Reserved. 8
Cassandra + Spark
32 nodes
Cassandra + SOLR
8 nodes
Analytics Logical Architecture
© DataStax, All Rights Reserved. 9
Thrift Server
Spark SQL
Job Server
Pluggable Architecture - Overview
© DataStax, All Rights Reserved. 10
Element’s pluggable Analytics stack gives us the ability to plug into multiple analytics tools
and choose the right tool depending on the questions we are asking. This gives us the
ability to add new analytics capabilities on top of Cassandra as they become available.
Columnar Data,
Fast Reads
SQL, Streaming
Search, Custom
DSE Graph
Future Tools
Pluggable Architecture - FiloDB
• FiloDB uses Cassandra for storage and Spark for computation
• Optimized for:
• Low latency queries and streaming
• Interactive ad-hoc analysis on Big Data
• Complex analytics and machine learning
• Efficient Columnar Storage (20-40X less storage)
• All queries are distributed and run in parallel in Spark
• Integrates with existing BI tools via JDBC/ODBC
• Horizontally scalable, fault tolerant
• Future enhancements include Geo Spatial Analysis
© DataStax, All Rights Reserved. 11
Fast Reads
Recent blog post by Evan Chan, renowned C* / Spark Expert
Pluggable Architecture – Apache Spark
© DataStax, All Rights Reserved. 12
SQL, Streaming
• In-memory, fast SQL
• Easily blend data from
multiple sources
• Connect to BI tools
• Ingest streaming data
sources like
telematics, weather,
engine diagnostics,
• Library of machine
learning algorithms for
advanced analytics
Pluggable Architecture – Lucene / SOLR
• Powerful search algorithms
• Geospatial indexing and geo-queries
• Custom dictionaries
• Efficient metric calculations
© DataStax, All Rights Reserved. 13
Search, Custom
Pluggable Architecture – DSE Enterprise Graph
• Graph databases store data as a network of relationships
• Provides optimized analytics for any data where relationships are most important
• Can improve query/analytics performance 1000X
Example use cases:
• IOT time series on streaming data
• Vehicle routing
• Visualize clusters of well/under performing assets
• Recommend optimal actions
• Fraud detection
© DataStax, All Rights Reserved. 14
DSE Graph
Graph Data
Pluggable Architecture – Cassandra
• High performance NoSQL database
• Flexible schema allows new data attributes to be easily added
• Peer-to-Peer, distributed architecture results in no single point of failure – different than traditional
• Elastic scalability to add more servers as workload increases
© DataStax, All Rights Reserved. 15
What our Platform Means to Customers
© DataStax, All Rights Reserved. 16
• 20x CPU Speed
• 10x Memory
• 70x Disk Performance
Cassandra database
framework has been
adopted by companies
running some of the world’s
largest and most
sophisticated real-time
Data Insights Action
• Maintenance history
• Fuel purchases
• Miles driven
• GPS location
• Points of Interest
• Weather
• Traffic
• Online repair reviews
• Fuel price geo-indexing
• Predict Operating Costs
• Fraud Detection
• Business Rule Exceptions
• Accident Predictors
• Optimal Replacement
• High risk DTC codes
• Repair sentiment analysis
• Vehicle Replacement
• Fraud actions
• Safe driving interventions
• Non-standard
maintenance schedule
• Recommend fueling and
maintenance facilities
Sifting through the data “noise” must be as fast as possible
in order to create actionable recommendations
Our Journey
Journey to Build a Unified BI and Analytics Platform
• Creating flexible data models that work for both BI and analytics
• Achieving high concurrency and low latency required for enterprise reporting platforms
• Optimizing software installation and configuration for performance
• Workload management
© DataStax, All Rights Reserved. 18
Dimensional Modeling for BI and Analytics
• BI Tools are designed to work with dimensional models
• Dimensional models are proven and easy to understand
and model
• Dimensional models are flexible, can answer many
• OLAP use cases require slicing and dicing data across
multiple dimensions
• JOIN capability is critical for achieving data models that
can answer various questions
© DataStax, All Rights Reserved. 19
Dim Dim
Dim Dim
Limitations of Spark SQL
• Cassandra + Spark cluster provides JOIN functionality
• Spark SQL is not able to pass filters applied on one table
to another table if both tables are joined on filtering
• Predicate pushdowns are not working for Outer JOIN
• Pushing predicates to Cassandra/Data source guarantees
better performance
© DataStax, All Rights Reserved. 20
Sample DAG plan for a JOIN SQL
with 5 tables
SQL Example:
Select c.customer_id, c.customer_name , i.invoice_amount
From customer c,invoice i
Where c.customer_id = i.customer_id
And c.customer_id = 123;
Spark splits above SQL into
Select c.customer_id, c.customer_name from customer c
Where c.customer_id = 123;
Select i.customer_id, invoice_amount
From invoice i;
Custom Thrift Server to Optimize SQL Statements
• Adds predicates to joining tables based on matching join columns
• Converts IN conditions to = conditions whenever IN List has only one value
• Adds IN predicate on partition column based on the range predicates supplied on non-partition key
© DataStax, All Rights Reserved. 21
Select c.customer_id, c.customer_name , i.invoice_amount
From customer c,
invoice i
Where c.customer_id = i.customer_id
And c.customer_id IN (123)
Select c.customer_id, c.customer_name , i.invoice_amount
From customer c,
invoice i
Where c.customer_id = i.customer_id
And c.customer_id = 123
And i.customer_id = 123
Thrift Server
Spark thrift
server with
Custom Hive
Logical Plan
Logical Plan
(if needed)
Submit plan
• Cassandra 2.1 has several restrictions on predicate pushdowns
• FiloDB is a true columnar store
• Provides ~20 – 30 times compression over Cassandra
• Very efficient for single and multiple partition scans
• Partial Predicate Pushdown support
• Provides ~20 - 30 times better read performance over straight
© DataStax, All Rights Reserved. 22
Rows of data
Get converted to compressed
columnar chunks
Cassandra Storage of FiloDB
Dimensional Data Modeling for Cassandra + Spark
• Simple STAR schema models as much as possible (eliminate
snow flakes, outer joins etc)
• De-normalized dimensions, facts (avoid duplicating dimensions
into facts)
• Minimize number of tables involved in joins
• Common partitioning strategy across dimensions and facts (easy
predicate handling)
• Limiting max partition sizes to ~1 GB
• Reduce number of partitions for efficient Spark execution, limit
partition sizes for efficient Cassandra read operations
© DataStax, All Rights Reserved. 23
C* C*
ODS is truncate/load daily.
ADS is complete replica of the source system. Incremental ETL strategy.
ODS tables are used to load FiloDB table (incremental) using Spark Jobs.
Power BI
Example: ETL Incremental Load Strategy
Results & Opportunities
• Successfully completed 300 concurrent user load test from Business Objects
• <1 second response from thrift servers for 90% of queries
• Average of 50 columns & 50 - 500k rows returned
• Single partition and multi-partition scans, Joins involving 5-10 FiloDB tables per each query
• Limitations on the maximum result size that can be collected using Spark SQL
• Limitations on the total concurrent result size requested from Spark thrift server
• These are tunable limitations
© DataStax, All Rights Reserved. 25
Lessons Learned
• Limitations of Cassandra for Fast Analytics, may require custom development
• Have a strategy to handle growth of Cassandra partitions
• Throttle read & write work loads for the size of the cluster
• Tombstone management
• Pick right ETL tool for the job.
• Turn off NUMAD service
• Lack of monitoring tools on Spark
• Spark’s lazy evaluation, makes debugging very difficult
© DataStax, All Rights Reserved. 26

More Related Content

What's hot

C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
Chris George
DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...
DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...
DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
Gwen (Chen) Shapira
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
Victor Coustenoble
Apache kudu
Apache kuduApache kudu
Apache kudu
Asim Jalis
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
DataStax Academy
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
Is hadoop for you
Is hadoop for youIs hadoop for you
Is hadoop for you
Gwen (Chen) Shapira
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
Yiwei Ma
Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
Alexey Grishchenko
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
Evan Chan
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
Yahoo Developer Network
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
Rajesh Nadipalli
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
Yahoo Developer Network
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyond
Matija Gobec

What's hot (20)

C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
C* Keys: Partitioning, Clustering, & CrossFit (Adam Hutson, DataScale) | Cass...
High concurrency,
Low latency analytics
using Spark/Kudu
 High concurrency,
Low latency analytics
using Spark/Kudu High concurrency,
Low latency analytics
using Spark/Kudu
High concurrency,
Low latency analytics
using Spark/Kudu
DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...
DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...
DataStax | Distributing the Enterprise, Safely (Thomas Valley) | Cassandra Su...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
Cloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQLCloudera Impala + PostgreSQL
Cloudera Impala + PostgreSQL
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
Apache kudu
Apache kuduApache kudu
Apache kudu
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
DataStax | DataStax Tools for Developers (Alex Popescu) | Cassandra Summit 2016
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
Is hadoop for you
Is hadoop for youIs hadoop for you
Is hadoop for you
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
Cassandra Tuning - above and beyond
Cassandra Tuning - above and beyondCassandra Tuning - above and beyond
Cassandra Tuning - above and beyond

Viewers also liked

Zimbra propulsé par le n°1 de l'hébergement critique
Zimbra propulsé par le n°1 de l'hébergement critiqueZimbra propulsé par le n°1 de l'hébergement critique
Zimbra propulsé par le n°1 de l'hébergement critique
Cloud Temple
Oracle Code Keynote with Thomas Kurian
Oracle Code Keynote with Thomas KurianOracle Code Keynote with Thomas Kurian
Oracle Code Keynote with Thomas Kurian
Oracle Developers
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Infinit: Modern Storage Platform for Container Environments
Infinit: Modern Storage Platform for Container EnvironmentsInfinit: Modern Storage Platform for Container Environments
Infinit: Modern Storage Platform for Container Environments
Docker, Inc.
Building Modern Applications Using APIs, Microservices and Chatbots
Building Modern Applications Using APIs, Microservices and ChatbotsBuilding Modern Applications Using APIs, Microservices and Chatbots
Building Modern Applications Using APIs, Microservices and Chatbots
Oracle Developers
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous Data
Steven Francia
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
Steven Francia
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in Go
Steven Francia
Implementación de un sistema 3D de información de servicios en el subsuelo en...
Implementación de un sistema 3D de información de servicios en el subsuelo en...Implementación de un sistema 3D de información de servicios en el subsuelo en...
Implementación de un sistema 3D de información de servicios en el subsuelo en...
Carles Colás
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
Patrick McFadin
An introduction to the MicroProfile
An introduction to the MicroProfileAn introduction to the MicroProfile
An introduction to the MicroProfile
Alex Soto
C* Summit 2013: Ground Traffic Control - Logistics with Cassandra by Jesse Young
C* Summit 2013: Ground Traffic Control - Logistics with Cassandra by Jesse YoungC* Summit 2013: Ground Traffic Control - Logistics with Cassandra by Jesse Young
C* Summit 2013: Ground Traffic Control - Logistics with Cassandra by Jesse Young
DataStax Academy
NoSQL into E-Commerce: lessons learned
NoSQL into E-Commerce: lessons learnedNoSQL into E-Commerce: lessons learned
NoSQL into E-Commerce: lessons learned
La FeWeb
Cassandra + Spark + Elk
Cassandra + Spark + ElkCassandra + Spark + Elk
Cassandra + Spark + Elk
Vasil Remeniuk
Cloud operating system
Cloud operating systemCloud operating system
Cloud operating system
sadak pramodh
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)
Steven Francia
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run Cassandra
DataStax Academy

Viewers also liked (20)

Zimbra propulsé par le n°1 de l'hébergement critique
Zimbra propulsé par le n°1 de l'hébergement critiqueZimbra propulsé par le n°1 de l'hébergement critique
Zimbra propulsé par le n°1 de l'hébergement critique
Oracle Code Keynote with Thomas Kurian
Oracle Code Keynote with Thomas KurianOracle Code Keynote with Thomas Kurian
Oracle Code Keynote with Thomas Kurian
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Elassandra: Elasticsearch as a Cassandra Secondary Index (Rémi Trouville, Vin...
Infinit: Modern Storage Platform for Container Environments
Infinit: Modern Storage Platform for Container EnvironmentsInfinit: Modern Storage Platform for Container Environments
Infinit: Modern Storage Platform for Container Environments
Building Modern Applications Using APIs, Microservices and Chatbots
Building Modern Applications Using APIs, Microservices and ChatbotsBuilding Modern Applications Using APIs, Microservices and Chatbots
Building Modern Applications Using APIs, Microservices and Chatbots
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
Lambda Architecture with Cassandra (Vaibhav Puranik, GumGum) | C* Summit 2016
MongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous DataMongoDB, Hadoop and Humongous Data
MongoDB, Hadoop and Humongous Data
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
Building Awesome CLI apps in Go
Building Awesome CLI apps in GoBuilding Awesome CLI apps in Go
Building Awesome CLI apps in Go
Implementación de un sistema 3D de información de servicios en el subsuelo en...
Implementación de un sistema 3D de información de servicios en el subsuelo en...Implementación de un sistema 3D de información de servicios en el subsuelo en...
Implementación de un sistema 3D de información de servicios en el subsuelo en...
Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
An introduction to the MicroProfile
An introduction to the MicroProfileAn introduction to the MicroProfile
An introduction to the MicroProfile
C* Summit 2013: Ground Traffic Control - Logistics with Cassandra by Jesse Young
C* Summit 2013: Ground Traffic Control - Logistics with Cassandra by Jesse YoungC* Summit 2013: Ground Traffic Control - Logistics with Cassandra by Jesse Young
C* Summit 2013: Ground Traffic Control - Logistics with Cassandra by Jesse Young
NoSQL into E-Commerce: lessons learned
NoSQL into E-Commerce: lessons learnedNoSQL into E-Commerce: lessons learned
NoSQL into E-Commerce: lessons learned
Cassandra + Spark + Elk
Cassandra + Spark + ElkCassandra + Spark + Elk
Cassandra + Spark + Elk
Cloud operating system
Cloud operating systemCloud operating system
Cloud operating system
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)
Mesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run CassandraMesosphere and Contentteam: A New Way to Run Cassandra
Mesosphere and Contentteam: A New Way to Run Cassandra

Similar to Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Corp.) | C* Summit 2016

ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
Cloudera, Inc.
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Sam Palani
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Luan Moreno Medeiros Maciel
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
Amazon Web Services
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
Thomas W. Fry
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
Cloudera, Inc.
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
DataStax Academy
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
Deepak Chandramouli
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Key Database Criteria for Cloud Applications
Key Database Criteria for Cloud ApplicationsKey Database Criteria for Cloud Applications
Key Database Criteria for Cloud Applications
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management Platforma
Novinky v Oracle Database 18c
Novinky v Oracle Database 18cNovinky v Oracle Database 18c
Novinky v Oracle Database 18c
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...

Similar to Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Corp.) | C* Summit 2016 (20)

ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Benchmark Showdown: Which Relational Database is the Fastest on AWS?
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
Cerebro: Bringing together data scientists and bi users - Royal Caribbean - S...
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Key Database Criteria for Cloud Applications
Key Database Criteria for Cloud ApplicationsKey Database Criteria for Cloud Applications
Key Database Criteria for Cloud Applications
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management Platforma
Novinky v Oracle Database 18c
Novinky v Oracle Database 18cNovinky v Oracle Database 18c
Novinky v Oracle Database 18c
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...
Webinar: The Performance Challenge: Providing an Amazing Customer Experience ...

More from DataStax

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking

More from DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking

Recently uploaded

The Politics of Agile Development.pptx
The  Politics of  Agile Development.pptxThe  Politics of  Agile Development.pptx
The Politics of Agile Development.pptx
What is Micro Frontends and Why Use it.pdf
What is Micro Frontends and Why Use it.pdfWhat is Micro Frontends and Why Use it.pdf
What is Micro Frontends and Why Use it.pdf
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing ToolsOld Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Benjamin Bischoff
Fixing Git Catastrophes - Nebraska.Code()
Fixing Git Catastrophes - Nebraska.Code()Fixing Git Catastrophes - Nebraska.Code()
Fixing Git Catastrophes - Nebraska.Code()
Gene Gotimer
How Generative AI is Shaping the Future of Software Application Development
How Generative AI is Shaping the Future of Software Application DevelopmentHow Generative AI is Shaping the Future of Software Application Development
How Generative AI is Shaping the Future of Software Application Development
09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching
Bring Strategic Portfolio Management to using OnePlan - Webinar 18...
Bring Strategic Portfolio Management to using OnePlan - Webinar 18...Bring Strategic Portfolio Management to using OnePlan - Webinar 18...
Bring Strategic Portfolio Management to using OnePlan - Webinar 18...
OnePlan Solutions
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Andre Hora
04. Ruby Operators Slides - Ruby Core Teaching
04. Ruby Operators Slides - Ruby Core Teaching04. Ruby Operators Slides - Ruby Core Teaching
04. Ruby Operators Slides - Ruby Core Teaching
Unlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by ConfluentUnlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by Confluent
Mastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GISMastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GIS
Safe Software
New York University degree Cert offer diploma Transcripta
New York University degree Cert offer diploma Transcripta New York University degree Cert offer diploma Transcripta
New York University degree Cert offer diploma Transcripta
Empowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - GrawlixEmpowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - Grawlix
Aarisha Shaikh
Literals - A Machine Independent Feature
Literals - A Machine Independent FeatureLiterals - A Machine Independent Feature
Literals - A Machine Independent Feature
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
Andre Hora
01. Ruby Introduction - Ruby Core Teaching
01. Ruby Introduction - Ruby Core Teaching01. Ruby Introduction - Ruby Core Teaching
01. Ruby Introduction - Ruby Core Teaching
Fantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdfFantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
CS Kwak
Waze vs. Google Maps vs. Apple Maps, Who Else.pdf
Waze vs. Google Maps vs. Apple Maps, Who Else.pdfWaze vs. Google Maps vs. Apple Maps, Who Else.pdf
Waze vs. Google Maps vs. Apple Maps, Who Else.pdf
Ben Ramedani

Recently uploaded (20)

The Politics of Agile Development.pptx
The  Politics of  Agile Development.pptxThe  Politics of  Agile Development.pptx
The Politics of Agile Development.pptx
What is Micro Frontends and Why Use it.pdf
What is Micro Frontends and Why Use it.pdfWhat is Micro Frontends and Why Use it.pdf
What is Micro Frontends and Why Use it.pdf
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing ToolsOld Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Old Tools, New Tricks: Unleashing the Power of Time-Tested Testing Tools
Fixing Git Catastrophes - Nebraska.Code()
Fixing Git Catastrophes - Nebraska.Code()Fixing Git Catastrophes - Nebraska.Code()
Fixing Git Catastrophes - Nebraska.Code()
How Generative AI is Shaping the Future of Software Application Development
How Generative AI is Shaping the Future of Software Application DevelopmentHow Generative AI is Shaping the Future of Software Application Development
How Generative AI is Shaping the Future of Software Application Development
09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching09. Ruby Object Oriented Programming - Ruby Core Teaching
09. Ruby Object Oriented Programming - Ruby Core Teaching
Bring Strategic Portfolio Management to using OnePlan - Webinar 18...
Bring Strategic Portfolio Management to using OnePlan - Webinar 18...Bring Strategic Portfolio Management to using OnePlan - Webinar 18...
Bring Strategic Portfolio Management to using OnePlan - Webinar 18...
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
Test Polarity: Detecting Positive and Negative Tests (FSE 2024)
04. Ruby Operators Slides - Ruby Core Teaching
04. Ruby Operators Slides - Ruby Core Teaching04. Ruby Operators Slides - Ruby Core Teaching
04. Ruby Operators Slides - Ruby Core Teaching
Unlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by ConfluentUnlocking value with event-driven architecture by Confluent
Unlocking value with event-driven architecture by Confluent
Mastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GISMastering MicroStation DGN: How to Integrate CAD and GIS
Mastering MicroStation DGN: How to Integrate CAD and GIS
New York University degree Cert offer diploma Transcripta
New York University degree Cert offer diploma Transcripta New York University degree Cert offer diploma Transcripta
New York University degree Cert offer diploma Transcripta
Empowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - GrawlixEmpowering Businesses with Intelligent Software Solutions - Grawlix
Empowering Businesses with Intelligent Software Solutions - Grawlix
Literals - A Machine Independent Feature
Literals - A Machine Independent FeatureLiterals - A Machine Independent Feature
Literals - A Machine Independent Feature
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
PathSpotter: Exploring Tested Paths to Discover Missing Tests (FSE 2024)
01. Ruby Introduction - Ruby Core Teaching
01. Ruby Introduction - Ruby Core Teaching01. Ruby Introduction - Ruby Core Teaching
01. Ruby Introduction - Ruby Core Teaching
Fantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdfFantastic Design Patterns and Where to use them No Notes.pdf
Fantastic Design Patterns and Where to use them No Notes.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
240717 ProPILE - Probing Privacy Leakage in Large Language Models.pdf
Waze vs. Google Maps vs. Apple Maps, Who Else.pdf
Waze vs. Google Maps vs. Apple Maps, Who Else.pdfWaze vs. Google Maps vs. Apple Maps, Who Else.pdf
Waze vs. Google Maps vs. Apple Maps, Who Else.pdf

Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Corp.) | C* Summit 2016

  • 1. Jim Peregord, Venu Palvai Element Fleet Management Building a Pluggable Analytics Stack with Cassandra as the Foundation
  • 2. 1 Background on Element Fleet Management 2 Key Use Cases Supported 3 Architecture 4 Our Journey 5 Lessons Learned 2© DataStax, All Rights Reserved.
  • 3. A Little About Us © DataStax, All Rights Reserved. 3 Jim Peregord Venu Palvai VP – Analytics, BI, Data Mgt Lead Architect
  • 4. Background on Element Fleet Management © DataStax, All Rights Reserved. 4 Full lifecycle of fleet management services Data consolidation and advanced analytics services Maximize customer ROI on fleet assets via data and advanced analytics 2,600 employees 1+ million vehicles managed $18 billion in total finance assets 2 billion rows of data and growing
  • 5. Greenfield Opportunity to Build Analytics Platform • Element acquired GE Fleet Management September 1, 2015 • Now the largest publicly held Fleet Management company in world • Pre-acquisition Element had limited data warehouse and Big Data tech • Greenfield Opportunity to build next gen BI and Advanced Analytics platform High-level Options Considered #1 – Build a separate data warehouse and Big Data/Advanced Analytics platform #2 – Build a single, unified architecture that supports both Our Decision #2 – Build a single, unified platform using DataStax © DataStax, All Rights Reserved. 5
  • 6. Key Use Cases Supported on New Platform • High availability out of the box • Linear and elastic scalability • High concurrency and low latency • Real-time ingestion of data streams: Vehicle (location, diagnostics), weather, traffic • Expose data and analytics via RESTful APIs • Advanced Analytics (Predictive, Prescriptive, Streaming) • Data warehouse and traditional reporting © DataStax, All Rights Reserved. 6
  • 8. Advanced Analytics Hardware Architecture • Purpose-Built Hardware for Advanced Analytics • NUMA/NVME Hardware is not commodity – it is highly specialized for very high performance. Tens of millions of IOPs. • Architected to scale 10x or even 100x current capacity – A must for Telematics and IOT data. • H/W Specs – 256GB, 4 X 2 TB SSD, dedicated C*/Spark instance per SSD • Active-Active clustering means very high availability • C* / Spark / SOLR / FiloDB / DSE Graph + NUMA – High performance analytics platform © DataStax, All Rights Reserved. 8 Cassandra + Spark 32 nodes Cassandra + SOLR 8 nodes
  • 9. Analytics Logical Architecture © DataStax, All Rights Reserved. 9 Events Streaming Sources Amazon SQS Kafka filoDB Internal Batch Sources External Thrift Server Spark SQL Job Server RESTful Packages (PySpark) MLlib Consumers
  • 10. Pluggable Architecture - Overview © DataStax, All Rights Reserved. 10 Element’s pluggable Analytics stack gives us the ability to plug into multiple analytics tools and choose the right tool depending on the questions we are asking. This gives us the ability to add new analytics capabilities on top of Cassandra as they become available. FiloDB Columnar Data, Fast Reads Spark SQL, Streaming Analytics, pySpark Lucene Search, Custom Dictionaries DSE Graph Graph-based Analytics Future Tools TBD
  • 11. Pluggable Architecture - FiloDB • FiloDB uses Cassandra for storage and Spark for computation • Optimized for: • Low latency queries and streaming • Interactive ad-hoc analysis on Big Data • Complex analytics and machine learning • Efficient Columnar Storage (20-40X less storage) • All queries are distributed and run in parallel in Spark • Integrates with existing BI tools via JDBC/ODBC • Horizontally scalable, fault tolerant • Future enhancements include Geo Spatial Analysis © DataStax, All Rights Reserved. 11 FiloDB Columnar Data, Fast Reads Recent blog post by Evan Chan, renowned C* / Spark Expert building-a-data-warehouse-using-spark-cassandra-and-filodb
  • 12. Pluggable Architecture – Apache Spark © DataStax, All Rights Reserved. 12 Spark SQL, Streaming Analytics Spark SQL • In-memory, fast SQL processing • Easily blend data from multiple sources • Connect to BI tools Spark Streaming • Ingest streaming data sources like telematics, weather, engine diagnostics, etc. Spark MLlib • Library of machine learning algorithms for advanced analytics
  • 13. Pluggable Architecture – Lucene / SOLR • Powerful search algorithms • Geospatial indexing and geo-queries • Custom dictionaries • Efficient metric calculations © DataStax, All Rights Reserved. 13 Lucene Search, Custom Dictionaries
  • 14. Pluggable Architecture – DSE Enterprise Graph • Graph databases store data as a network of relationships • Provides optimized analytics for any data where relationships are most important • Can improve query/analytics performance 1000X Example use cases: • IOT time series on streaming data • Vehicle routing • Visualize clusters of well/under performing assets • Recommend optimal actions • Fraud detection © DataStax, All Rights Reserved. 14 DSE Graph Graph Data Analytics
  • 15. Pluggable Architecture – Cassandra • High performance NoSQL database • Flexible schema allows new data attributes to be easily added • Peer-to-Peer, distributed architecture results in no single point of failure – different than traditional databases • Elastic scalability to add more servers as workload increases © DataStax, All Rights Reserved. 15
  • 16. What our Platform Means to Customers © DataStax, All Rights Reserved. 16 INFRASTRUCTURE IMPROVEMENTS • 20x CPU Speed • 10x Memory • 70x Disk Performance ALL RUNNING ON Cassandra database framework has been adopted by companies running some of the world’s largest and most sophisticated real-time analytics Data Insights Action • Maintenance history • Fuel purchases • Miles driven • GPS location • Points of Interest • Weather • Traffic • Online repair reviews • Fuel price geo-indexing • Predict Operating Costs • Fraud Detection • Business Rule Exceptions • Accident Predictors • Optimal Replacement • High risk DTC codes • Repair sentiment analysis • Vehicle Replacement Schedule • Fraud actions • Safe driving interventions • Non-standard maintenance schedule • Recommend fueling and maintenance facilities Sifting through the data “noise” must be as fast as possible in order to create actionable recommendations
  • 18. Journey to Build a Unified BI and Analytics Platform • Creating flexible data models that work for both BI and analytics • Achieving high concurrency and low latency required for enterprise reporting platforms • Optimizing software installation and configuration for performance • Workload management © DataStax, All Rights Reserved. 18
  • 19. Dimensional Modeling for BI and Analytics • BI Tools are designed to work with dimensional models • Dimensional models are proven and easy to understand and model • Dimensional models are flexible, can answer many questions • OLAP use cases require slicing and dicing data across multiple dimensions • JOIN capability is critical for achieving data models that can answer various questions © DataStax, All Rights Reserved. 19 Fact Dim Dim Dim Dim
  • 20. Limitations of Spark SQL • Cassandra + Spark cluster provides JOIN functionality • Spark SQL is not able to pass filters applied on one table to another table if both tables are joined on filtering columns. • Predicate pushdowns are not working for Outer JOIN relationship • Pushing predicates to Cassandra/Data source guarantees better performance © DataStax, All Rights Reserved. 20 Sample DAG plan for a JOIN SQL with 5 tables SQL Example: Select c.customer_id, c.customer_name , i.invoice_amount From customer c,invoice i Where c.customer_id = i.customer_id And c.customer_id = 123; Spark splits above SQL into Select c.customer_id, c.customer_name from customer c Where c.customer_id = 123; Select i.customer_id, invoice_amount From invoice i;
  • 21. Custom Thrift Server to Optimize SQL Statements • Adds predicates to joining tables based on matching join columns • Converts IN conditions to = conditions whenever IN List has only one value • Adds IN predicate on partition column based on the range predicates supplied on non-partition key columns © DataStax, All Rights Reserved. 21 Example Select c.customer_id, c.customer_name , i.invoice_amount From customer c, invoice i Where c.customer_id = i.customer_id And c.customer_id IN (123) Select c.customer_id, c.customer_name , i.invoice_amount From customer c, invoice i Where c.customer_id = i.customer_id And c.customer_id = 123 And i.customer_id = 123 Custom Thrift Server Spark thrift server with Custom Hive Context Inspect Logical Plan Modify Logical Plan (if needed) Submit plan for Execution
  • 22. FiloDB • Cassandra 2.1 has several restrictions on predicate pushdowns • FiloDB is a true columnar store • Provides ~20 – 30 times compression over Cassandra • Very efficient for single and multiple partition scans • Partial Predicate Pushdown support • Provides ~20 - 30 times better read performance over straight Cassandra © DataStax, All Rights Reserved. 22 Ck1 Ck2 Rows of data Get converted to compressed columnar chunks Cassandra Storage of FiloDB data
  • 23. Dimensional Data Modeling for Cassandra + Spark • Simple STAR schema models as much as possible (eliminate snow flakes, outer joins etc) • De-normalized dimensions, facts (avoid duplicating dimensions into facts) • Minimize number of tables involved in joins • Common partitioning strategy across dimensions and facts (easy predicate handling) • Limiting max partition sizes to ~1 GB • Reduce number of partitions for efficient Spark execution, limit partition sizes for efficient Cassandra read operations © DataStax, All Rights Reserved. 23
  • 24. SPIN ODS ADS FILO DB CASSANDRA/SPARK JDBC SPARK C* C* ETL - TALEND RELOAD INCREMENTAL INCREMENTAL THRIFT ODS is truncate/load daily. ADS is complete replica of the source system. Incremental ETL strategy. ODS tables are used to load FiloDB table (incremental) using Spark Jobs. SSRS Power BI Example: ETL Incremental Load Strategy
  • 25. Results & Opportunities • Successfully completed 300 concurrent user load test from Business Objects • <1 second response from thrift servers for 90% of queries • Average of 50 columns & 50 - 500k rows returned • Single partition and multi-partition scans, Joins involving 5-10 FiloDB tables per each query Opportunities • Limitations on the maximum result size that can be collected using Spark SQL • Limitations on the total concurrent result size requested from Spark thrift server • These are tunable limitations © DataStax, All Rights Reserved. 25
  • 26. Lessons Learned • Limitations of Cassandra for Fast Analytics, may require custom development • Have a strategy to handle growth of Cassandra partitions • Throttle read & write work loads for the size of the cluster • Tombstone management • Pick right ETL tool for the job. • Turn off NUMAD service • Lack of monitoring tools on Spark • Spark’s lazy evaluation, makes debugging very difficult © DataStax, All Rights Reserved. 26