An Introduction to the MapR Converged Data Platform

© 2017 MapR Technologies 1
Welcome
• Please use your computers audio to listen to this webcast.
• Country call in numbers are available online at:
– https://www.readytalk.com/rt/an.php?tfnum=8667401260
– UK toll-free: 0800 279 4827
– Germany toll-free: 0800 589 1848
– PASSCODE: 762604

An Introduction to the
MapR Converged Data Platform
Antje Barth
EMEA Solutions Architect
MapR
Tony Young
EMEAAlliances & Channels
MapR

 MapR Technologies
 The MapR Converged Data Platform
 MapR-FS
 MapR-DB
 MapR-Streams
 Use Cases for the Converged Data Platform
 How to get started with MapR
 MapR Converged Partner Program
 Q&A
Agenda

MapR: The Company

MapR is Transforming Business with Data
WHAT
WE DO
Bring together
analytics and operations
into next-generation
Converged Applications
for the business
WHY
IT MATTERS
Empowers companies to
grow margins through
innovation and cutting
costs
HOW
WE DO IT
Patented technology
architecture with the
world’s only complete
Converged Data Platform
Leading companies around the world are transforming their
business with the industry’s only Converged Data Platform

MapR Corporate Timeline
MapR in
Stealth Mode
2009
2013
2014
2015
2016
MapR Becomes
the Hadoop
Technology
Leader
MapR-DB: The
First In-Hadoop
Database
Apache Drill: First
Schema-Free
Analytics
MapR Streams:
Global Event
Processing
2011
Converged Data
Platform
2017+
Rapid Innovation
Continues
$194M in Equity Funding

MapR Financial Strength
88% Revenue and Billings GrowthHigh Growth
130% $ Based Net ExpansionHigh Expansion
99%High Retention Customer Retention ($ Based)

WORLDWIDE
PRESENCE &
CUSTOMER
SUPPORT
HQ

MapR Worldwide Community
200K +
Participants
50K +
Customers
& Consultants
Registered
On-Demand Training
Forum
Support

Community Participant, Contributor, Leader
• MapR actively contributes
– Bug fixes
– Improvements
• MapR leads projects
– Apache Drill
– Apache Myriad
• MapR supports the community
– Free Code Fridays
– High quality free on-demand training
– Sponsorships, Meet-ups, and more
Arrow

MapR in the News
Internet of Things SAP

Question:
“How do you take
operational data, move it to
analytics and then use
those insights to change
customer experiences?”

The MapR Converged Data
Platform

Customers Are Pressured As Never Before
Pressure of
technology waves
Pressure to innovate
while cutting cost
Developer
Executive
IT Administrator

“The explosion of data, changing application
requirements, and key infrastructure &
technology trends have created the need for
a new data platform”.

RDBMS
Data Was Structured & Shackled

Audio Billing Data Call Detail
Records
Clickstream CSV Data Documents Emails
JSON
Medical
Records
Merchant
Listings
Meta Data Mobile Data Netw ork Data PDF Product
Catalog
Sensor Data Server LogsSet Top Box Social
Media
Text Files Text
Messages
Video XML
Data Got Into The Drivers Seat!

More Data Means Applications Can Become Smarter

Streaming
Analytics
NoSQL Batch
Analytics
Storage
Messaging Processing
Engines
RDMBS
Next-gen Applications Have Complex
Requirements

App
1
App
4
App
3
App
2
Data
1
Data
2
Data
4
Data
3
AppApp
App
App
AppApp
Each application solved one problem
and created its own data type Diverse data assets must be accessible
from anywhere by microservices
Application & Data Model Has Radically Changed

Commodity scale-
out hardware
Container
virtualization
Clouds
Machine Learning MicroservicesSmarter edge
Technology Will Drive Intelligent App Evolution

Hadoop &
Spark
Cluster
Document
DB
Classic Data
Warehouse
NoSQL
Application
Server
Message
Middleware
Search
Server
Expensive to Stitch | Fragile | Limitations for Speed, Scale, Reliability
Point Products Impede Adoption And Create
Complexity

Hadoop &
Spark
Cluster
Classic Data
Warehouse
NoSQL
Application
Server
Message
Middleware
Search
Server
Expensive to Stitch | Fragile | Limitations for Speed, Scale, Reliability
Point Products Impede Adoption And Create
Complexity
Document
DB
Its not the circles,
It’s the lines that are hard

Putting It In One Distribution Does Not Converge
Anything!

Database
MapR-DB
Event Streaming
MapR Streams
High Availability
Web-Scale Storage
MapR-FS
Real Time Unified Security Multi-tenancy Disaster Recovery Global Namespace
A Different Approach: Converged Data Platform
Files, Tables, Streams
together on same platform
Shared Services
Supports Open-Source APIs
On-Premise, In the Cloud, Hybrid
Patented Architecture

HDFS API POSIX, NFS HBase API JSON API Kafka API
Database
MapR-DB
Event Streaming
MapR Streams
Enterprise-Grade
Platform Services
High Availability
Web-Scale Storage
MapR-FS
Open Source Runs Better with Scale, Speed & Reliability

A software platform for
operationalizing data
to enable intelligent applications

ANALYTICS
Business insight
OPERATIONS
Business performance
Convergence Enables Operationalizing The Data
Better
Operationalize
the data

MapR Architected Specifically For Convergence
NoSQL Web scale
Storage
MessagingProcessing
Engines
Real Time Unified Security Multi-tenancy Disaster Recovery
Streaming
• Extreme scale with ultra low latency for speed
• “In place” updates for greater speed and no silos
• Real time ingest & low latency processing
• Rich Data Models & APIs
• Built-in Analytics including ML
• DevelopmentAgility & DeploymentFlexibility
• Global mission critical foundation
• Single security model

Database
MapR-DB
Event Streaming
MapR Streams
Web-Scale Storage
MapR-FS
Real
Time
The Architecture Of The Foundation Matters
High
Availability
Data
Protection
Disaster
Recovery
Performance Replication Scalability
Mirroring
Multi
Tenancy
SecuritySelf Healing
Snapshots

1010101001001
1000100010010110100101010
0101001010101010101100
A
MapR Platform Security
Flexible
Authentication
Granular
Authorization
• Wire-level authentication for all
services in the cluster
• Integration with LDAP, Active
Directory and other third party
directory services
• Kerberos or username/password
authentication
• Access Control Expressions
• Protect files, tables, column families,
columns, and management objects
• Extend to role-based access control
(RBAC) with custom role functions
• Drill Views
• All events recorded immediately
in JSON log files
• Includes data access and
administrative actions
• Ad-hoc queries and custom
reports on audit logs via SQL
and standard BI tools
• Encryption for Data in Motion
• Within a Cluster
• Between Clusters
• Between Client and Cluster
• Encryption for Data at Rest
• LUKS
• Self-Encrypting Disk
• Partners
• AES-256 Encryption in GCM Mode
ADP
AA
4
21
3
Ubiquitous
Data Protection
Robust
Auditing

MapR Cluster Architecture
Rack 1
Node 1
Node 2
Node 3
Node N
Node …
Node …
Rack 2 Rack .. Rack ..
Select Processingand PlatformServices(Variesby Node)
EnterpriseStorage
MapR-FS MapR-DB
Database
MapRStreams
Event Streaming
CoreMapR Data Services (Every Node)
Horizontal scaling for files, tables, documents, streams, and compute. 5 nodes or thousands.

MapR-FS
A real distributed file system

Data & metadata fully distributed
A
A
A
B
B
B
C
C
C
D
D
D
E
E
E
Architecture: Built for Speed, Scale, Reliability
32
GB
256 MB
8 KB
Hierarchical organization of data
No single point of failure
Fast parallel access
Exabyte scale
Full read-write

MapR Innovations Enable Speed, Scale, Reliability
1. Patented on-disk structures for multiple workloads
• Containers, chunks, and blocks
2. Optimized resource consumption
• No JVMs, single process space
3. Data and job placement control
• Explicitly define nodes for data and jobs
Single MapR Cluster
Storage Hardware
MapR-FS + MapR-DB + MapR Streams
Fast, efficient, direct I/O

Transparent: The NFS-Enabled MapR File System
Easy for scientists to use, easy for IT staff to administer, easy for systems & apps to integrate
Drag-n-Drop
User Data Files
Easily transfer data in
and out of a MapR cluster
using standard file browsers
Log Directly to a
MapR Cluster
Write system log files
directly to a MapR cluster
for instant analysis and
long-term retention
$ find . | grep log
$ cp /mapr/cluster
$ scp /mapr/cluster
$ vi results'
$ tail -f part-00000
Connect Applications
without Customization
Fully read/write file system
supports virtually unlimited
number of files of any size
POSIX-compliant file system
supports familiar Linux
commands and tools
Standard OS Utilities

MapR POSIX Client: Multiple cluster access
Redundant gateway s f or
high av ailability
CLIENT NODE(S)
NFS
Gateway
NFS
Gateway
NFS client
(included in OS)
Native applications
HDFS API
(hadoop-core-*.jar)
MapR POSIXClient
MapR cluster
Hadoop applications
(e.g. “Hadoop f s –put”)
File-based apps/utils
(e.g. cp, emacs)
NFS
Gateway
2
3
1
POSIX Client can work with multiple clusters
simultaneously unifying namespace and easing
universal data access
- Full Wire Level Encryption
- Inline Compression
- High Performance Ingest multiple write/read
E-Series
E-Series
E-Series
E-Series
E-Series
E-Series
MapR cluster MapR cluster

MapR-DB
A Converged NoSQL Database

Relational Databases Were Not Designed for Big Data
• RDBMSs are the default
choice for applications
– But large, rapidly changing,
and/or diverse data sets add
cost/time pressures
• This forces trade-offs with
your data
• Or significant costs
RDBMS
$$$
Throwing extra money
at the problem?
Throwing away data to
preserve performance?

Current Challenges with Other NoSQL Databases
• Coarse grained access controls
– “All or nothing” per record
• Unreliable multi-masterreplication
• Modeling of complex data
– Longer app development cycles
– Higher chance of coding errors
• Data loss‡ and inconsistency
• Cluster/silo sprawl
– Maintenance pains
– Complexity, more error prone
• Constant data movement between
database and analytics cluster
– Excessive bandwidth utilization
– Delays in accessing data
• Long maintenance downtime
(e.g., compactions, anti-entropy)
‡ See Jepsen tests at https://aphyr.com/tags/Jepsen

How MapR Resolves These NoSQL Challenges
• Tighter analytics integration
• Automatic optimizations
• Fine grained access controls
• Global multi-master deployment capability
• JSON document model for rapid application
development
• Strong consistency and proven data integrity
{
”model”: ”JSON”
}
Converged Data Platform
✓

Example Use Cases for MapR-DB
• Enterprise data hubs (or “data lakes”)
• Predictive analytics
• Internet-of-things / time series data analysis

Single Cluster Data Lake Capabilities
MapR-DB: relational,
time series,
structured data
MapR-FS: emails,
blogs, tweets, log
files, unstructured
data
MapR Streams:
event data, IoT data
Agile, self-
service data
exploration
ETL into operational
reporting formats (e.g.,
Parquet)
Multi-tenancy:
job/data placement
control, volumes
Access controls:
file, table, column,
column family, doc,
sub-doc levels
Sources
RELATIONAL,
SAAS,
MAINFRAME
DOCUMENTS,
EMAILS
LOG FILES,
CLICKSTREAM
SENSORS
BLOGS,
TWEETS,
LINK DATA
DATA
WAREHOUSES,
DATA MARTS
Auditing:
compliance, analyze
user accesses
Snapshots:
track data lineage
and history
Table Replication:
global multi-master,
business continuity
Enterprise Storage Database Event Streaming
MapR-FS MapR-DB MapR Streams

MapR Advantages for Predictive Analytics
Paste your MapR distribution for
Hadoop diagram from Part A,
(slide 2) here
MapR-DB MapR-FS
MapR Data Platform
Distribution including
Apache Hadoop
MapR-DB: load 100s
of millionsof data
pointsper second in
JSON format from
millionsof sources
Interactive,
human-driven
analytics
Multi-tenancy:
colocate distinct data
sets in same cluster
Access controls:
file, table, column,
column family, doc,
sub-doc levels
Sources
SENSOR DATA
High Availability:
ensure continuity
despite system
component failure
Snapshots:
static view for
repeatability for
machine learning
Table Replication:
global multi-master,
business continuity
Real-time
applications
Machine-driven analytics:
predictive analytics,
machine learning, etc.

MapR Streams
A global pub-sub event streaming system for big data

Database
MapR-DB
Event Streaming
MapR Streams
High Availability
Web-Scale Storage
MapR-FS
Global Pub-Sub Streaming Engine With Persistence
Producers
Publish Billions of messages/sec
to a topic
Consumers
Reliable delivery to all consumers.
Immediately
Global
Tie together geo-dispersed
clusters. Worldwide

Converged
Continuous
Global
• Native, global data and metadata replication with arbitrary topology
• Millions of streams, 100K topics/stream
• Billions of events per second
• Millions of producers & consumers
• Converged platform with file storage and database
• OJAI API - Direct access from analytics tools
• Unified security framework with files and database tables
• Multi-tenant - topic isolation, quotas, data placement control
• Integrated with Spark Streaming, Flink, Apex, others
• Message persistence for up to infinite time span
• Guaranteed delivery (at least once)
• Consistent, synchronous replication & no single point of failure
MapR Streams - Converged, Continuous, Global

Source Capture Store Process Serve
Flume
NFS
MapR
Streams
MapR-FS
MapR-DB
Spark
Streaming
Spark
Drill
Elasticsearch
Search
Dashboard
Ops
Dashboard
MQTT
Gateway
Part of a Converged Reference Architecture

Example Use Cases for MapR-Streams
• IoT: Global Data Transport & Processing
• Retail: Customer Location Optimization
• Finance: Real-time Transaction Processing

IoT: Global Data Transport & Processing
USE CASE
Business Results
● New revenue streams from collecting and
processing data from “things”.
● Low response times by placing collection and
processing near users.
Why Streaming
● IoT is event-based, and needs an event
streaming architecture.
Why MapR
● Converged platform gives single cluster, single
security model for data in motion and at rest.
● Reliable global replication for distributed
collection, analysis, and DR.
Global Dashboards, Alerts, Processing
Local Collection, Filtering, Aggregation

Retail: Customer Location Optimization
Business Results
● Improved customer satisfaction by
responding to traffic spikes in real time.
● Tighter security by providing real-time alerts
of anomalous user locations or patterns.
Why Streaming
● Real-time collection and processing of user
location data provided by wireless APs.
Why MapR
● Global topics for cross-location monitoring
● Converged platform providing whole solution
Machine learning of historical patterns
Real-time processing & alerting pipeline
SQL engine for historical queries & exploration
USE CASE

Finance: Real-time Transaction Processing
Business Results
● Improved user satisfaction with real-time mobile
notifications of purchases.
● More fraud detected in real-time.
● More productive staff with data exploration.
Why Streaming
● Seamless, real-time connection between
mainframe RDBMS and ETL/processing.
Why MapR
● Utility-grade reliability.
● Converged platform provides end-to-end
application services - streaming, ML, DB.
● Converged security gives unified
authentication, authorization, encryption.
USE CASE
Transactions
Fraud
Detection
App
Streaming
Mobile
Push
App
Data Exploration

A Cloud-Agnostic Platform For Global Delivery
Application Execution

Three Key “Agilities” Drive Our Priorities
Data Agility
• Unified Files, Tables, Streams
• Support for schemas that change
• Multi-model support in a DBMS
Application Agility
• Microservices support
• No-copy access to Files, Streams, DB
• Multiple compute engines + key ecosystem
components
• Consistent security model
Infrastructure Agility
• Multi-dimensional elasticity
• Global multi-cloud
• Container apps with data persistence
Database
MapR-DB
Event Streaming
MapR Streams
High Availability
Web-Scale Storage
MapR-FS

MapR Innovates Continuously
2011
Industrial
grade data
platform for
big data
analytics 1.0
2013
Industrial
grade NoSQL
Key Value
Store DBMS
2012
Industry’s first
visual big data
ops
dashboard in
MapR control
system
2014
Global multi
datacenter
replication
Fast Ingest
1.0
2016
Global
streaming
JSON
Document DB
Fast Ingest
2.0
Spyglass
Monitoring
2015
Schema free
SQL engine
for big data
Global table
replication
2017
Persisted data
access for
Docker
containers

Use Cases for the Converged
Data Platform

The Big Data Journey to As-it-Happens Business
Real-time
Batch
IT Focused Business Focused
Big Data Spectrum
Legacy
Offload
• Mainframe
• Data Warehouse
• RDBMS
• SAN/NAS
Platform Update
• BI/Analytics
• Data Lake/Hub
• File Management
Process Analysis
• Clickstream Analysis
• Log Analytics
• Security Analytics
• Social Analytics
Predictive Operations
• Preventative Maintenance
• Yield Optimization
• Machine Learning
• Assembly Line Optimization
Agile Business
• Fraud Prevention
• Ad Targeting
• Transportation Logistics
• Smart Cities
Process Optimization
• Customer 360
• Recommendation Engine
• Drug Discovery
• Credit Scoring
• Genomics

MapR is faster and more
mature than other distros that we
have used. They are innovating
faster than others.
Mike Brown, CTO, comScore

MapR by Industry

MapR is Helping to Transform Businesses
$1B
Additional Revenue
Fortune 50
Retailer
Over 50Applications
10%+
Increased Conversion
$40M
Revenue Driven
From1of15usecases
AmexOffers
$180M
Driven by Targeting
$10M+
Cost Savings
Claimpaymentintegrity
LargestBiometricDB
$4B
Yearly Savings
ShoppingonHP.com

Business
Impact
World’s Largest Biometric Database
South Asian country creates biometric backed identification system for all citizens
• Increase % of citizens who have bank accounts and can access benefits
• Reduce corruption and fraud in government aid programs
• Issues with data replication and loss across clusters in competing distribution
• Weak disaster recovery strategy in competitive distribution
• Complicated upgrade process and high availability issues
• Complete data backup: Snapshots and mirroring
• Lower maintenance overhead: Rolling upgrades
• Fingerprints and retina scans with 200 millisecond response:MapR-DB
OBJECTIVES
CHALLENGES
SOLUTION
• Approximately 20% reduction in fraud and leakage of government aid programs ($50B)
• Average citizen’s life is transformed as they can get access to various stipulated benefits
• Over 1 billion citizens currently enrolled providing identity for approximately 80% of the population

MapR gives me the reliability
to keep our online service up
and running 24x7x365.
CTO, International Government Program (Aadhaar)

Fraud Detection & Recommendations
104 Million Card Members
• Dozens of use cases,multi-PB scale
• 100s of PhDs and data scientists
• Machine learning to supportMyOffers
• Machine learning to supportcredit card fraud —
protects over $1T in spending each year
• Fraudulent transactions automatically trigger alerts to
phone, email, text for the cardholder

How to get started with MapR

On-Demand Training
- Academy Essentials
- Academy Pro
- Partner Discounts

Try MapR - https://mapr.com/solutions
• Quick Start Solutions
• Solutions by Industry
• Big Data Use Cases

MapR Converged Partner
Program

MapR Converge Partner Program

Key MapR Advantage Partners
Business Services
INFRASTRUCTURE
& CLOUD
ANALYTICS &
BUSINESS INTELLIGENCE
APPLICATIONS
& OS
CONSULTANTS
& INTEGRATORS
DATA WAREHOUSE
& INTEGRATION

Why Partner with MapR?
 Join the Re-Platforming of the enterprise
 Enterprise Software Business
 Hyper Growth
 A platform to Innovate ON
 Increase revenue – market opportunity, referral and reseller

MapR Converge Network Partner Program
The Converge Partner categories are:
•Consulting Partners
•Platform Partners
•Software Partners
•Resellers
•Distributors
The Converge Partner achievement
levels are:
•Elite (invite only)
•Preferred
•Affiliate

Converged Partner Program continued…
Example Benefits Submit Application
 Include world-class enablement and strategic
support
 Marketing and sales alignment for maximum joint
ROI
 World-class training and implementation programs
 Joint strategic business and GTM planning and
execution
 Featured App Gallery

Up-Coming Events & Resources

 Connected Cars
 June 13th – 15th, London, UK
 Autonomous Machines World
 June 26th – 27th, Berlin, DE
 EMEA Partner Summit
 TBC September 2017, London UK
 Convergence
 October 19th, London, UK
Dates for the diary

Resources
BLOG CONVERGE COMMUNITY BIG DATA TRAINING MapR CERTIFICATIONS
The MapR blog provides
how-to advice, insights,
best practices, and
useful resources to help
your executives,
enterprise architects,
and developers more
effectively leverage data
to grow your business.
• Go to the blog
Whether you're an
admin, architect,
developer or analyst,
the Converge
Community is the one
place where you can
find all you need to
know about the
technology behind
MapR Products. Come
learn about, discuss,
and use MapR products
and services, along with
other related
technologies.
• Find Answers in
the Community
Learn big data your
way: On demand,
anytime, anywhere.
Take interactive e-
learning courses, with
custom sandboxes and
lab exercises from the
data and analytics
experts at MapR.
• Start Learning
Prove your skills: Get
certified and flash your
MapR credentials. The
learning curve is the
earning curve.
• Get Certified

An Introduction to the MapR Converged Data Platform

Related slideshows

More Related Content

What's hot

What's hot (20)

Similar to An Introduction to the MapR Converged Data Platform

Similar to An Introduction to the MapR Converged Data Platform (20)

More from MapR Technologies

More from MapR Technologies (20)

Recently uploaded

Recently uploaded (20)

An Introduction to the MapR Converged Data Platform