Le big data à l'épreuve des projets d'entreprise

LOGO du client

Big Data
à l’épreuve des projets d’entreprise
#2013

Ecotaxe
§ Flux entrant 24/7
• 2 000 points par seconde
• 200 paquets par seconde

§ Flux sortant 24/7
• 3* 200 paquets par seconde

§ Conservation 3 mois
• 1, 5 Milliard de paquets
• 7 téraoctets

Big Data
Règle des 3V
Big data is high-volume, high-velocity and high-variety information
assets that demand cost-effective, innovative forms of information
processing for enhanced insight and decision making.

gartner.com

Big Data
Règle des 3V
Big data is high-volume, high-velocity and high-variety information
assets that demand cost-effective, innovative forms of information
processing for enhanced insight and decision making.

gartner.com

Variety

Volume

Velocity

Toujours plus…

Quantité
de données

Temps

Toujours plus, et plus encore…

Quantité
de données

Temps

The Inverted U

Qualité de
décision

Peter Morville

Sous information

Sur information

Quantité d’informations

U

U/

Data è Information

Pour créer du sens
il faut
transformer la donnée en information

métadonnées

Donnée
propriétés
Exemple : individu, événement,
équipement

métadonnées

Exemple : tags, chronologie, géolocalisation,
relations, notes, commentaires….

Métadonnées
Donnée
propriétés

Information

Métadonnées
Donnée
propriétés

Cycle de création

Information
Data …………………………………………….......
Méta - Information

Création

Enrichissement

timeline

Rechercher / Représenter
Dan Roam

Rechercher / Représenter

#FacettedSearch

Trajectoire

Stocker

Rechercher

Analyser

Ecotaxe
Stocker

Rechercher

Analyser

Ecotaxe
§ Flux entrant 24/7
• 2 000 points par seconde
• 200 paquets par seconde

§ Flux sortant 24/7
• 3* 200 paquets par seconde

§ Conservation 3 mois
• 1, 5 Milliard de paquets
• 7 téraoctets

#Volume #Velocity

Architecture

#MongoDB
#Cluster
#Sharding
#Multi-sites

RETEX MongoDB

Changement de paradigme
§ En phase amont
Lutter contre la peur des décideurs / la résistance des équipes

§ En phase de spécifications /réalisation
Intégrer l’approche documentaire vs approche relationnelle
Former les équipes de développement
Exemple : logique transactionnelle

§ En phase de production
Lutter contre l’hébergement traditionnel / san
Favoriser l’approche horizontale vs verticale

Vertical / Horizontal

« Scalabilité » Verticale
Si besoin de plus de puissance
• on ajoute de la mémoire ….
• puis on remplace par un serveur de gamme plus
puissante

Corollaire : les machines sont surdimensionnées
pour absorber une augmentation potentielle de
charge

Vertical / Horizontal
« Scalabilité» Horizontale
Si besoin de plus de puissance
• on ajoute des serveurs
Corollaire : linéarisation du coût / usage

MongoDB
Ne pas utilisez MongoDB si votre système est transactionnel, pour le reste …

§ Avantages
• Qualité de la documentation
• Mise en œuvre rapide
• Versatilité

§ Inconvénient
• Sharding pas si simple !

§ Bénéfices
• Agilité fonctionnelle
• Evolution du modèle aisée / versionnement natif

• Agilité technique
• Alignement matériel par rapports aux usages

SPARK
Stocker

Rechercher

Analyser

RETEX Elasticsearch

CQRS
Command Query Responsibility Segregation

Command

Query

Store

Index

EventBus

Stocker

Rechercher

Analyser

Rubedo
Le CMS Big Data

RETEX Rubedo
Premier CMS open-source
basé sur un socle NoSQL
+

Dans un monde où
LAMP est LA Norme

NoSQL, mais pour quoi faire ?

NoSQL et Gestion de contenus
§ Les CMS gèrent des Contenus …

… structurés
et
classés

Rubedo : comparaison des approches
Approche relationnelle
type MySQL

Pour un type de contenu : 6 tables
Pour 10 types de contenus : 29 tables
1 requête unitaire = 6 tables et 2 jointures

Approche NoSQL
documentaire
type MongoDB

Pour un type de contenu : 1 collection
Pour 10 types de contenus : 1 collection
1 requête unitaire : 1 collection

Rubedo : les atouts du NoSQL
§ Atouts Fonctionnels

§ Limites & précautions

• Souplesse de modélisation
• Evolutivité dans le temps
• Fonctionnalités de Recherche

•
•

Pas de transactions
Déport des règles métiers dans
la couche applicative

§ Atouts Techniques
•
•
•
•
•

Performances en lecture/écriture
Stockage de grands volumes
Montée en charge linéaire
Gestion des fichiers intégrée (MongoDB) •
Sécurité centralisée
•

Framework de développement
indispensable !
Certaines typologies de projets
peuvent nécessiter une
architecture hybride (site de ecommerce complexe par
exemple)

Rubedo : les cas d’usage

Performances &
Volumétrie

Mobilité

Ergonomie

Souplesse

Use
cases

Recherche &
Géolocalisation

Ouverture &
Extensibilité

§ Portails à fort trafic ou volumétrie § Contenus géo-localisés & cartographie
§ Moteurs de recherche verticaux
§ Plateformes multi-sites
§ Plateformes de contribution décentralisées
§ Sites mobiles

RUBEDO : démonstration

NoSQL

JavaScript,
HTML5,
CS
CSS3

DEMONSTRATION

LOGO du client

Merci de votre attention

Elasticsearch
Revolutionizing Data Search
and Analytics
Richard Maurer– SEMEA Territory Manager

Agenda
•  Purpose of Elasticsearch
•  Features of Product
•  Customer Examples
•  Company Overview
•  Commercial Offerings
•  Resources

Purpose of Elasticsearch
•  Organize data and make it easily accessible
–  Through powerful search and analytics
–  Easily consumable (even for non-data scientists)
–  Elegantly handles extremely large data volumes
–  Delivers results in real time

•  Technology stack agnostic
•  Used across all market verticals

Features of Elasticsearch
•  Structured & unstructured search
•  Advanced analytics capabilities
•  Unmatched performance
•  Real-time results
•  Highly scalable
•  User friendly installation and maintenance

User: GitHub
Searches 20TB of data, 1.3 billion files and 130 lines
of code using Elasticsearch

User: Foursquare
Searches 50,000,000 venues every day using
Elasticsearch

User: Fog Creek Software
Searches 40,000,000,000 (40 billion) lines of code in
real-time using Elasticsearch

User: StumbleUpon
Delivers millions of recommendations every day
using Elasticsearch

Example: Email Archiving
Email Archiving of 2 Petabytes of data across 100’s of servers
Big data, structured and unstructured

Example: Support Agents
Custom Support – Search, Facets, and Reports
Real time metrics

Unprecedented Uptake
Elasticsearch has more than 5 Million downloads
… and 400,000 more each month

Cumulative
Cumulative
m

Company Overview
More than 5 million downloads
400,000 New Downloads per Month
1000s of Mission Critical Implementations
Top Investors: Benchmark Capital, Index
Ventures
•  Seasoned Executive Team

• 
• 
• 
• 

–  Founded by Creator of Elasticsearch
–  Seasoned Executives from SpringSource

User Raves
Chris Cowan @uhduh
I’m in love with @elasticsearch! I want to use it for everything right now!
Alain Richardt @alaincxs
Moving ffrom #solr to # Elasticsearch is like upgrading from a Reliant Robin to a McLaren
F1
Pete Connolly @peteconnolly
Two really useful and productive days of training from @kimchy and @uboness all about
#elasticsearch. Best training course in years
Cyril Lacôte @clacote
#ElasticSearch is the s*&t. Amazingly simple and powerful. Open source is awesome.
That's made my day.
Logan Lowell @fractaloop
Tweaking @elasticsearch for huge indexes can be fun. I'm very glad the IRC channel is so
helpful too.

Product Offerings:
Support Throughout Your Project
1.  Core Elasticsearch Training
2.  Development and Production Support
3.  Technical Account Manager

1: Training
Core Elasticsearch Training
•  Two day classroom training
•  Delivered by Elasticsearch developers
1.  Worldwide Public Courses
2.  Onsite Training Course

3: Technical Account Manager
• 
• 
• 
• 
• 

Named technical resource
Single point of contact into Elasticsearch
Onboarding call to assess your goals
Four health checks per year
Go-to expert to drive success with your
Elasticsearch deployment

Resources
•  www.elasticsearch.com
•  www.elasticsearch.org
•  User Groups:
http://www.elasticsearch.org/community/forum/
•  Contact:
Richard Maurer
Territory Manager
Richard.maurer@elasticsearch.com

Le Big Data à l'épreuve des
projets d'entreprise

Yann Aubry
Regional Director

Top Big Data Challenges?
Translation?
Most struggle
to know what
Big Data is,
how to manage
it and who can
manage it

3

Source: Gartner

Understanding Big Data – It’s Not Very “Big”

64% - Ingest diverse,
new data in real-time

15% - More than 100TB
of data
20% - Less than 100TB
(average of all? <20TB)
from Big Data Executive Summary – 50+ top executives from Government and F500 firms

4

6

Applications
CRM, ERP, Collaboration, Mobile, BI

Data Management
Online Data
RDBMS
RDBMS

Offline Data
Hadoop

Infrastructure
OS & Virtualization, Compute, Storage, Network

EDW

Security & Auditing

Management & Monitoring

Enterprise Big Data Stack

Consideration – Online vs. Offline
Online

•  Real-time
•  Low-latency
•  High availability
7

vs.

Offline

•  Long-running
•  High-Latency
•  Availability is lower priority

Consideration – Online vs. Offline
Online

8

vs.

Offline

MongoDB/NoSQL Is Good for!

360° View of the
Customer

Fraud Detection

User Data
Management

Content
Management &
Delivery

Reference Data

Product Catalogs

9

Mobile & Social
Apps

Machine to
Machine Apps

Data Hub

Hadoop Is Good for!

Risk Modeling

Recommendation
Engine

Ad Targeting

Transaction
Analysis

Trade
Surveillance

Network Failure
Prediction

10

Churn Analysis

Search Quality

Data Lake

Case Study
Insurance leader generates coveted 360-degree view of
customers in 90 days – “The Wall”
Problem
• 

No single view of
customer

• 

145 yrs of policy data,
70+ systems, 15+ apps

• 

2 years, $25M trying to
aggregate in RDBMS –
failed

Why MongoDB
•  Agility – prototype in 5
days; production in 90
days
•  Dynamic schema & rich
querying – combine
disparate data into one
data store
•  Hot tech to attract top
talent

12

Results
•  Increased call center
productivity
•  Better customer
experience, reduced
churn, more upsell opps
•  Dozens more projects in
the works to leverage
this data platform

Machine Learning

Ad-Serving

Algorithms
MongoDB
Connector for
Hadoop

• 
• 
• 
• 
• 

13

Catalogs and products
User profiles
Clicks
Views
Transactions

•  User segmentation
•  Recommendation engine
•  Prediction engine

MongoDB
The leading NoSQL database

General
Purpose

15

Document
Database

OpenSource

MongoDB Vision
To provide the best database for how we build and
run apps today
Build
–  New and complex data
–  Flexible
–  New languages
–  Faster development

16

Run
–  Big Data scalability
–  Real-time
–  Commodity hardware
–  Cloud

Fortune 500 & Global 500
•  10 of the Top Financial Services Institutions
•  10 of the Top Electronics Companies
•  10 of the Top Media and Entertainment
Companies
•  8 of the Top Retailers
•  6 of the Top Telcos
•  5 of the Top Technology Companies
•  4 of the Top Healthcare Companies
17

Global Community
5,000,000+
MongoDB Downloads

100,000+
Online Education Registrants

20,000+
MongoDB User Group Members

20,000+
MongoDB Days Attendees

20,000+
MongoDB Management Service (MMS) Users

18

MongoDB Features
• JSON Document Model
with Dynamic Schemas

• Full, Flexible Index Support
and Rich Queries

•  Auto-Sharding for
Horizontal Scalability

•  Built-In Replication for High
Availability

•  Text Search

•  Advanced Security

•  Aggregation Framework
and MapReduce

•  Large Media Storage with
GridFS

19

MongoDB Business Value

Enabling New Apps

Faster Time to Market
20

Better Customer Experience

Lower TCO

MongoDB Solutions
Big Data

Content Mgmt & Delivery

User Data Management

21

Mobile & Social

Data Hub

MongoDB Partners (200+)
Software & Services

Cloud & Channel

22

Hardware

MongoDB Products and Services
Subscriptions
MongoDB Enterprise, MMS (On-Prem), Professional Support,
Commercial License

Consulting
Expert Resources for All Phases of MongoDB Implementations

Training
Online and In-Person for Developers and Administrators

MongoDB Management Service (MMS)
Cloud-Based Suite of Services for Managing MongoDB
Deployments

23

MongoDB Enterprise
Enterprise build with value-added capabilities
•  Advanced Security w/Kerberos
•  On-Prem Management
–  Visualization and alerts on 100+ system metrics
–  Backup features coming soon
–  On-premise version of MongoDB Monitoring Services (MMS)

•  Enterprise Software Integration via SNMP
•  Private, On-Demand MongoDB University Training
•  Certified OS Support
25

MongoDB Management Service
Cloud-based suite of services for managing
MongoDB deployments
•  Monitoring, with charts,
dashboards and alerts on 100+
metrics
•  Backup and restore, with pointin-time recovery, support for
sharded clusters
•  MMS On-Prem included with MongoDB Enterprise
(backup coming soon)
26

Consulting
Technical Account
Manager

Custom Consulting

•  Named MongoDB
expert

•  Assist with all phases of
project

•  Advisory services

•  E.g., config., testing,
optimization, best
practices

•  Ongoing basis

Lightning Consults also available

27

Health Check

•  Assess overall status
and health of existing
MongoDB deployment

Training
Public

Private

•  Dev, admin, and
combined courses
available
•  North America and
EMEA

•  Customized to your
needs
•  For devs and admins
•  On-Site

Online
•  Free
•  For devs and admins
•  7 weeks
•  Weekly lectures,
homework, final exam

Private, On-Demand MongoDB University Training
Included with MongoDB Enterprise Subscription
28

For More Information
Resource
MongoDB Downloads

mongodb.com/download

Free Online Training

education.mongodb.com

Webinars and Events

mongodb.com/events

White Papers

mongodb.com/white-papers

Case Studies

mongodb.com/customers

Presentations

mongodb.com/presentations

Documentation

docs.mongodb.org

Additional Info

29

Location

info@mongodb.com

Le big data à l'épreuve des projets d'entreprise

Related slideshows

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Le big data à l'épreuve des projets d'entreprise

Similar to Le big data à l'épreuve des projets d'entreprise (20)

More from Rubedo, a WebTales solution

More from Rubedo, a WebTales solution (20)

Recently uploaded

Recently uploaded (20)

Le big data à l'épreuve des projets d'entreprise