SlideShare a Scribd company logo
November 2011 – Hadoop World NYC

Advanced HBase Schema Design
Lars George, Solutions Architect
Agenda

1    Intro to HBase Architecture
2    Schema Design
3    Examples
4    Wrap up




2                    ©2011 Cloudera, Inc. All Rights Reserved. Confidential.
                    Reproduction or redistribution without written permission is
                                            prohibited.
About Me

• Solutions Architect @ Cloudera
• Apache HBase & Whirr Committer
• Author of
  HBase – The Definitive Guide
• Working with HBase since end
  of 2007
• Organizer of the Munich OpenHUG
• Speaker at Conferences (Fosdem, Hadoop
  World)
Overview

• Schema design is vital
• Needs to be done eventually
  – Same for RDBMS
• Exposes architecture and implementation
  “features”
• Might be handled in storage layer
  – eg. MegaStore, Percolator
Configuration Layers   (aka “OSI for HBase”)
HBase Architecture
HBase Architecture
• HBase uses HDFS (or similar) as its reliable
  storage layer
  – Handles checksums, replication, failover
• Native Java API, Gateway for REST, Thrift,
  Avro
• Master manages cluster
• RegionServer manage data
• ZooKeeper is used the “neural network”
  – Crucial for HBase
  – Bootstraps and coordinates cluster
Auto Sharding
Distribution
Auto Sharding and Distribution

• Unit of scalability in HBase is the Region
• Sorted, contiguous range of rows
• Spread “randomly” across RegionServer
• Moved around for load balancing and
  failover
• Split automatically or manually to scale
  with growing data
• Capacity is solely a factor of cluster nodes
  vs. regions per node
Column Families
Storage Separation

• Column Families allow for separation of data
  – Used by Columnar Databases for fast analytical
    queries, but on column level only
  – Allows different or no compression depending on
    the content type
• Segregate information based on access
  pattern
• Data is stored in one or more storage file,
  called HFiles
Merge Reads
Bloom Filter

• Bloom Filters are generated when HFile is
  persisted
  – Stored at the end of each HFile
  – Loaded into memory
• Allows check on row or row+column level
• Can filter entire store files from reads
  – Useful when data is grouped
• Also useful when many misses are
  expected during reads (non existing keys)
Bloom Filter
Fold, Store, and Shift
Fold, Store, and Shift

• Logical layout does not match physical
  one
• All values are stored with the full
  coordinates, including: Row Key, Column
  Family, Column Qualifier, and Timestamp
• Folds columns into “row per column”
• NULLs are cost free as nothing is stored
• Versions are multiple “rows” in folded table
Key Cardinality
Key Cardinality

• The best performance is gained from using
  row keys
• Time range bound reads can skip store files
  – So can Bloom Filters
• Selecting column families reduces the
  amount of data to be scanned
• Pure value based filtering is a full table scan
  – Filters often are too, but reduce network traffic
Tall-Narrow vs. Flat-Wide Tables
• Rows do not split
  – Might end up with one row per region
• Same storage footprint
• Put more details into the row key
  – Sometimes dummy column only
  – Make use of partial key scans
• Tall with Scans, Wide with Gets
  – Atomicity only on row level
• Example: Large graphs, stored as adjacency
  matrix
Example: Mail Inbox

        <userId> : <colfam> : <messageId> : <timestamp> : <email-message>

12345   :   data   :   5fc38314-e290-ae5da5fc375d      :   1307097848   :   "Hi Lars, ..."
12345   :   data   :   725aae5f-d72e-f90f3f070419      :   1307099848   :   "Welcome, and ..."
12345   :   data   :   cc6775b3-f249-c6dd2b1a7467      :   1307101848   :   "To Whom It ..."
12345   :   data   :   dcbee495-6d5e-6ed48124632c      :   1307103848   :   "Hi, how are ..."


                                              or
12345-5fc38314-e290-ae5da5fc375d        :   data   :   :   1307097848   :   "Hi Lars, ..."
12345-725aae5f-d72e-f90f3f070419        :   data   :   :   1307099848   :   "Welcome, and ..."
12345-cc6775b3-f249-c6dd2b1a7467        :   data   :   :   1307101848   :   "To Whom It ..."
12345-dcbee495-6d5e-6ed48124632c        :   data   :   :   1307103848   :   "Hi, how are ..."


                              Same Storage Requirements
Partial Key Scans
Key                                          Description
<userId>                                     Scan over all
                                             messages for a given
                                             user ID
<userId>-<date>                              Scan over all
                                             messages on a given
                                             date for the given user
                                             ID
<userId>-<date>-<messageId>                  Scan over all parts of a
                                             message for a given
                                             user ID and date
<userId>-<date>-<messageId>-<attachmentId>   Scan over all
                                             attachments of a
                                             message for a given
                                             user ID and date
Sequential Keys
    <timestamp><more key>: {CF: {CQ: {TS : Val}}}

• Hotspotting on Regions: bad!
• Instead do one of the following:
  – Salting
     • Prefix <timestamp> with distributed value
     • Binning or bucketing rows across regions
  – Key field swap/promotion
     • Move <more key> before the timestamp (see
       OpenTSDB later)
  – Randomization
     • Move <timestamp> out of key
Key Design
Key Design

• Based on access pattern, either use
  sequential or random keys
• Often a combination of both is needed
  – Overcome architectural limitations
• Neither is necessarily bad
  – Use bulk import for sequential keys and reads
  – Random keys are good for random access
    patterns
Example: Facebook Insights

• > 20B Events per Day
• 1M Counter Updates per Second
  – 100 Nodes Cluster
  – 10K OPS per Node
• ”Like” button triggers AJAX request
• Event written to log file
• 30mins current for website owner
  Web ➜ Scribe ➜ Ptail ➜ Puma ➜ HBase
HBase Counters
• Store counters per Domain and per URL
  – Leverage HBase increment (atomic read-modify-
    write) feature
• Each row is one specific Domain or URL
• The columns are the counters for specific
  metrics
• Column families are used to group counters
  by time range
  – Set time-to-live on CF level to auto-expire
    counters by age to save space, e.g., 2 weeks on
    “Daily Counters” family
Key Design
• Reversed Domains
  – Examples: “com.cloudera.www”, “com.cloudera.blog”
  – Helps keeping pages per site close, as HBase efficiently
    scans blocks of sorted keys
• Domain Row Key =
MD5(Reversed Domain) +
  Reversed Domain
  – Leading MD5 hash spreads keys randomly across all regions
    for load balancing reasons
  – Only hashing the domain groups per site (and per
    subdomain if needed)
• URL Row Key =
MD5(Reversed Domain) + Reversed
  Domain + URL ID
  – Unique ID per URL already available, make use of it
Insights Schema
Example: OpenTSDB




• Metric Type, Tags are stored as IDs
• Periodically rolled up
Summary

• Design for Use-Case
    – Read, Write, or Both?
•   Avoid Hotspotting
•   Consider using IDs instead of full text
•   Leverage Column Family to HFile relation
•   Shift details to appropriate position
    – Composite Keys
    – Column Qualifiers
Summary (cont.)

• Schema design is a combination of
  – Designing the keys (row and column)
  – Segregate data into column families
  – Choose compression and block sizes
• Similar techniques are needed to scale most
  systems
  – Add indexes, partition data, consistent hashing
• Denormalization, Duplication, and Intelligent
  Keys (DDI)
Questions?




Email:     lars@cloudera.com
Twitter:   @larsgeorge

More Related Content

What's hot

Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
alexbaranau
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
Anil Gupta
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
DataWorks Summit
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
kbajda
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
lucenerevolution
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
alexbaranau
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
MIJIN AN
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
MongoDB
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
Yoshinori Matsunobu
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Ravi Teja
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
NAVER D2
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
DataWorks Summit
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxData
 

What's hot (20)

Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 

Viewers also liked

20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
Evan Liu
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
Nick Dimiduk
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
Lars Hofhansl
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
Nick Dimiduk
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
Jurriaan Persyn
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...
Cloudera, Inc.
 
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLHands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Piotr Pruski
 
NoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNoSQL with Hadoop and HBase
NoSQL with Hadoop and HBase
NGDATA
 
Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012
Cosmin Lehene
 
Hadoop and HBase in the Real World
Hadoop and HBase in the Real WorldHadoop and HBase in the Real World
Hadoop and HBase in the Real World
Cloudera, Inc.
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBase
DataWorks Summit
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
phanleson
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
HBaseCon
 
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
Markus Klems
 
HBase杂谈
HBase杂谈HBase杂谈
HBase杂谈
Joseph Pan
 
Hbase Nosql
Hbase NosqlHbase Nosql
Hbase Nosql
elliando dias
 
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
DataWorks Summit
 
Introducción a Apache HBase
Introducción a Apache HBaseIntroducción a Apache HBase
Introducción a Apache HBase
Marcos Ortiz Valmaseda
 
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
tatsuya6502
 

Viewers also liked (20)

20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies20090713 Hbase Schema Design Case Studies
20090713 Hbase Schema Design Case Studies
 
HBase for Architects
HBase for ArchitectsHBase for Architects
HBase for Architects
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Apache HBase for Architects
Apache HBase for ArchitectsApache HBase for Architects
Apache HBase for Architects
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...
Hadoop World 2011: Storing and Indexing Social Media Content in the Hadoop Ec...
 
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLHands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Hands-on-Lab: Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
 
NoSQL with Hadoop and HBase
NoSQL with Hadoop and HBaseNoSQL with Hadoop and HBase
NoSQL with Hadoop and HBase
 
Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012Low Latency “OLAP” with HBase - HBaseCon 2012
Low Latency “OLAP” with HBase - HBaseCon 2012
 
Hadoop and HBase in the Real World
Hadoop and HBase in the Real WorldHadoop and HBase in the Real World
Hadoop and HBase in the Real World
 
The Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBaseThe Evolution of a Relational Database Layer over HBase
The Evolution of a Relational Database Layer over HBase
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
 
HBase杂谈
HBase杂谈HBase杂谈
HBase杂谈
 
Hbase Nosql
Hbase NosqlHbase Nosql
Hbase Nosql
 
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
Building a geospatial processing pipeline using Hadoop and HBase and how Mons...
 
Introducción a Apache HBase
Introducción a Apache HBaseIntroducción a Apache HBase
Introducción a Apache HBase
 
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
Tokyo HBase Meetup - Realtime Big Data at Facebook with Hadoop and HBase (ja)
 

Similar to Hadoop World 2011: Advanced HBase Schema Design

Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
Chris Huang
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
SudheerKumar499932
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
Keeyong Han
 
REDIS327
REDIS327REDIS327
REDIS327
Rajan Bhatt
 
Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentation
advaitdeo
 
Making Session Stores More Intelligent
Making Session Stores More IntelligentMaking Session Stores More Intelligent
Making Session Stores More Intelligent
Kyle Davis
 
Cassandra
CassandraCassandra
Cassandra
exsuns
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
vijayapraba1
 
No sql databases
No sql databasesNo sql databases
No sql databases
swathika rajan
 
Zing Database – Distributed Key-Value Database
Zing Database – Distributed Key-Value DatabaseZing Database – Distributed Key-Value Database
Zing Database – Distributed Key-Value Database
zingopen
 
Zing Database
Zing Database Zing Database
Zing Database
Long Dao
 
La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcacoLa big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
Data Con LA
 
NoSql
NoSqlNoSql
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
Andriy Zabavskyy
 
HBaseCon 2015: HBase @ Flipboard
HBaseCon 2015: HBase @ FlipboardHBaseCon 2015: HBase @ Flipboard
HBaseCon 2015: HBase @ Flipboard
HBaseCon
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
Matthew Blair
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
Joe Alex
 
How & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit DublinHow & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit Dublin
Amazon Web Services
 
Socialite, the Open Source Status Feed
Socialite, the Open Source Status FeedSocialite, the Open Source Status Feed
Socialite, the Open Source Status Feed
MongoDB
 
Gfs google-file-system-13331
Gfs google-file-system-13331Gfs google-file-system-13331
Gfs google-file-system-13331
Fengchang Xie
 

Similar to Hadoop World 2011: Advanced HBase Schema Design (20)

Hbase schema design and sizing apache-con europe - nov 2012
Hbase schema design and sizing   apache-con europe - nov 2012Hbase schema design and sizing   apache-con europe - nov 2012
Hbase schema design and sizing apache-con europe - nov 2012
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 
REDIS327
REDIS327REDIS327
REDIS327
 
Dynamodb Presentation
Dynamodb PresentationDynamodb Presentation
Dynamodb Presentation
 
Making Session Stores More Intelligent
Making Session Stores More IntelligentMaking Session Stores More Intelligent
Making Session Stores More Intelligent
 
Cassandra
CassandraCassandra
Cassandra
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Zing Database – Distributed Key-Value Database
Zing Database – Distributed Key-Value DatabaseZing Database – Distributed Key-Value Database
Zing Database – Distributed Key-Value Database
 
Zing Database
Zing Database Zing Database
Zing Database
 
La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcacoLa big datacamp-2014-aws-dynamodb-overview-michael_limcaco
La big datacamp-2014-aws-dynamodb-overview-michael_limcaco
 
NoSql
NoSqlNoSql
NoSql
 
A Closer Look at Apache Kudu
A Closer Look at Apache KuduA Closer Look at Apache Kudu
A Closer Look at Apache Kudu
 
HBaseCon 2015: HBase @ Flipboard
HBaseCon 2015: HBase @ FlipboardHBaseCon 2015: HBase @ Flipboard
HBaseCon 2015: HBase @ Flipboard
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
How & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit DublinHow & When to Use NoSQL at Websummit Dublin
How & When to Use NoSQL at Websummit Dublin
 
Socialite, the Open Source Status Feed
Socialite, the Open Source Status FeedSocialite, the Open Source Status Feed
Socialite, the Open Source Status Feed
 
Gfs google-file-system-13331
Gfs google-file-system-13331Gfs google-file-system-13331
Gfs google-file-system-13331
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
amitchopra0215
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Mydbops
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
huseindihon
 
Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1
FellyciaHikmahwarani
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
ArgaBisma
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
Safe Software
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
SynapseIndia
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
HackersList
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
Matthew Sinclair
 
HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)
Alpen-Adria-Universität
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
Yevgen Sysoyev
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
Kief Morris
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
Eric D. Schabell
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
BookNet Canada
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
ScyllaDB
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
Enterprise Wired
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
Larry Smarr
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
ScyllaDB
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
Mark Billinghurst
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
UiPathCommunity
 

Recently uploaded (20)

@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
@Call @Girls Pune 0000000000 Riya Khan Beautiful Girl any Time
 
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - MydbopsScaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
Scaling Connections in PostgreSQL Postgres Bangalore(PGBLR) Meetup-2 - Mydbops
 
find out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challengesfind out more about the role of autonomous vehicles in facing global challenges
find out more about the role of autonomous vehicles in facing global challenges
 
Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1Why do You Have to Redesign?_Redesign Challenge Day 1
Why do You Have to Redesign?_Redesign Challenge Day 1
 
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdfWhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
WhatsApp Image 2024-03-27 at 08.19.52_bfd93109.pdf
 
Coordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar SlidesCoordinate Systems in FME 101 - Webinar Slides
Coordinate Systems in FME 101 - Webinar Slides
 
How RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptxHow RPA Help in the Transportation and Logistics Industry.pptx
How RPA Help in the Transportation and Logistics Industry.pptx
 
How Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdfHow Social Media Hackers Help You to See Your Wife's Message.pdf
How Social Media Hackers Help You to See Your Wife's Message.pdf
 
20240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 202420240704 QFM023 Engineering Leadership Reading List June 2024
20240704 QFM023 Engineering Leadership Reading List June 2024
 
HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)HTTP Adaptive Streaming – Quo Vadis (2024)
HTTP Adaptive Streaming – Quo Vadis (2024)
 
DealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 editionDealBook of Ukraine: 2024 edition
DealBook of Ukraine: 2024 edition
 
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
[Talk] Moving Beyond Spaghetti Infrastructure [AOTB] 2024-07-04.pdf
 
Observability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetryObservability For You and Me with OpenTelemetry
Observability For You and Me with OpenTelemetry
 
Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024Details of description part II: Describing images in practice - Tech Forum 2024
Details of description part II: Describing images in practice - Tech Forum 2024
 
How to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory ModelHow to Avoid Learning the Linux-Kernel Memory Model
How to Avoid Learning the Linux-Kernel Memory Model
 
7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf7 Most Powerful Solar Storms in the History of Earth.pdf
7 Most Powerful Solar Storms in the History of Earth.pdf
 
The Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU CampusesThe Increasing Use of the National Research Platform by the CSU Campuses
The Increasing Use of the National Research Platform by the CSU Campuses
 
Performance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy EvertsPerformance Budgets for the Real World by Tammy Everts
Performance Budgets for the Real World by Tammy Everts
 
Research Directions for Cross Reality Interfaces
Research Directions for Cross Reality InterfacesResearch Directions for Cross Reality Interfaces
Research Directions for Cross Reality Interfaces
 
UiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs ConferenceUiPath Community Day Kraków: Devs4Devs Conference
UiPath Community Day Kraków: Devs4Devs Conference
 

Hadoop World 2011: Advanced HBase Schema Design

  • 1. November 2011 – Hadoop World NYC Advanced HBase Schema Design Lars George, Solutions Architect
  • 2. Agenda 1 Intro to HBase Architecture 2 Schema Design 3 Examples 4 Wrap up 2 ©2011 Cloudera, Inc. All Rights Reserved. Confidential. Reproduction or redistribution without written permission is prohibited.
  • 3. About Me • Solutions Architect @ Cloudera • Apache HBase & Whirr Committer • Author of HBase – The Definitive Guide • Working with HBase since end of 2007 • Organizer of the Munich OpenHUG • Speaker at Conferences (Fosdem, Hadoop World)
  • 4. Overview • Schema design is vital • Needs to be done eventually – Same for RDBMS • Exposes architecture and implementation “features” • Might be handled in storage layer – eg. MegaStore, Percolator
  • 5. Configuration Layers (aka “OSI for HBase”)
  • 7. HBase Architecture • HBase uses HDFS (or similar) as its reliable storage layer – Handles checksums, replication, failover • Native Java API, Gateway for REST, Thrift, Avro • Master manages cluster • RegionServer manage data • ZooKeeper is used the “neural network” – Crucial for HBase – Bootstraps and coordinates cluster
  • 10. Auto Sharding and Distribution • Unit of scalability in HBase is the Region • Sorted, contiguous range of rows • Spread “randomly” across RegionServer • Moved around for load balancing and failover • Split automatically or manually to scale with growing data • Capacity is solely a factor of cluster nodes vs. regions per node
  • 12. Storage Separation • Column Families allow for separation of data – Used by Columnar Databases for fast analytical queries, but on column level only – Allows different or no compression depending on the content type • Segregate information based on access pattern • Data is stored in one or more storage file, called HFiles
  • 14. Bloom Filter • Bloom Filters are generated when HFile is persisted – Stored at the end of each HFile – Loaded into memory • Allows check on row or row+column level • Can filter entire store files from reads – Useful when data is grouped • Also useful when many misses are expected during reads (non existing keys)
  • 17. Fold, Store, and Shift • Logical layout does not match physical one • All values are stored with the full coordinates, including: Row Key, Column Family, Column Qualifier, and Timestamp • Folds columns into “row per column” • NULLs are cost free as nothing is stored • Versions are multiple “rows” in folded table
  • 19. Key Cardinality • The best performance is gained from using row keys • Time range bound reads can skip store files – So can Bloom Filters • Selecting column families reduces the amount of data to be scanned • Pure value based filtering is a full table scan – Filters often are too, but reduce network traffic
  • 20. Tall-Narrow vs. Flat-Wide Tables • Rows do not split – Might end up with one row per region • Same storage footprint • Put more details into the row key – Sometimes dummy column only – Make use of partial key scans • Tall with Scans, Wide with Gets – Atomicity only on row level • Example: Large graphs, stored as adjacency matrix
  • 21. Example: Mail Inbox <userId> : <colfam> : <messageId> : <timestamp> : <email-message> 12345 : data : 5fc38314-e290-ae5da5fc375d : 1307097848 : "Hi Lars, ..." 12345 : data : 725aae5f-d72e-f90f3f070419 : 1307099848 : "Welcome, and ..." 12345 : data : cc6775b3-f249-c6dd2b1a7467 : 1307101848 : "To Whom It ..." 12345 : data : dcbee495-6d5e-6ed48124632c : 1307103848 : "Hi, how are ..." or 12345-5fc38314-e290-ae5da5fc375d : data : : 1307097848 : "Hi Lars, ..." 12345-725aae5f-d72e-f90f3f070419 : data : : 1307099848 : "Welcome, and ..." 12345-cc6775b3-f249-c6dd2b1a7467 : data : : 1307101848 : "To Whom It ..." 12345-dcbee495-6d5e-6ed48124632c : data : : 1307103848 : "Hi, how are ..."  Same Storage Requirements
  • 22. Partial Key Scans Key Description <userId> Scan over all messages for a given user ID <userId>-<date> Scan over all messages on a given date for the given user ID <userId>-<date>-<messageId> Scan over all parts of a message for a given user ID and date <userId>-<date>-<messageId>-<attachmentId> Scan over all attachments of a message for a given user ID and date
  • 23. Sequential Keys <timestamp><more key>: {CF: {CQ: {TS : Val}}} • Hotspotting on Regions: bad! • Instead do one of the following: – Salting • Prefix <timestamp> with distributed value • Binning or bucketing rows across regions – Key field swap/promotion • Move <more key> before the timestamp (see OpenTSDB later) – Randomization • Move <timestamp> out of key
  • 25. Key Design • Based on access pattern, either use sequential or random keys • Often a combination of both is needed – Overcome architectural limitations • Neither is necessarily bad – Use bulk import for sequential keys and reads – Random keys are good for random access patterns
  • 26. Example: Facebook Insights • > 20B Events per Day • 1M Counter Updates per Second – 100 Nodes Cluster – 10K OPS per Node • ”Like” button triggers AJAX request • Event written to log file • 30mins current for website owner Web ➜ Scribe ➜ Ptail ➜ Puma ➜ HBase
  • 27. HBase Counters • Store counters per Domain and per URL – Leverage HBase increment (atomic read-modify- write) feature • Each row is one specific Domain or URL • The columns are the counters for specific metrics • Column families are used to group counters by time range – Set time-to-live on CF level to auto-expire counters by age to save space, e.g., 2 weeks on “Daily Counters” family
  • 28. Key Design • Reversed Domains – Examples: “com.cloudera.www”, “com.cloudera.blog” – Helps keeping pages per site close, as HBase efficiently scans blocks of sorted keys • Domain Row Key =
MD5(Reversed Domain) + Reversed Domain – Leading MD5 hash spreads keys randomly across all regions for load balancing reasons – Only hashing the domain groups per site (and per subdomain if needed) • URL Row Key =
MD5(Reversed Domain) + Reversed Domain + URL ID – Unique ID per URL already available, make use of it
  • 30. Example: OpenTSDB • Metric Type, Tags are stored as IDs • Periodically rolled up
  • 31. Summary • Design for Use-Case – Read, Write, or Both? • Avoid Hotspotting • Consider using IDs instead of full text • Leverage Column Family to HFile relation • Shift details to appropriate position – Composite Keys – Column Qualifiers
  • 32. Summary (cont.) • Schema design is a combination of – Designing the keys (row and column) – Segregate data into column families – Choose compression and block sizes • Similar techniques are needed to scale most systems – Add indexes, partition data, consistent hashing • Denormalization, Duplication, and Intelligent Keys (DDI)
  • 33. Questions? Email: lars@cloudera.com Twitter: @larsgeorge