SlideShare a Scribd company logo
CHAPTER 04: HBASE TABLE DESIGN
HBase IN ACTION
by Nick Dimiduk et. al.
Overview: HBase table design
 HBase schema design concepts
 Mapping relational modeling knowledge to the
HBase world
 Advanced table definition parameters
 HBase Filters to optimize read performance
4.1 How to approach schema design
 When we say schema, we include the following
considerations:
 How many column families should the table have?
 What data goes into what column family?
 How many columns should be in each column family?
 What should the column names be?
 What information should go into the cells?
 How many versions should be stored for each cell?
 What should the rowkey structure be, and what should it
contain?
Hbase Course
 Data Manipulation at Scale: Systems and
Algorithms
 Using HBase for Real-time Access to Your Big
Data
4.1.1 Modeling for the questions
 A table store data about what users a particular user
follows, support
 read the entire list of users,
 and query for the presence of a specific user in that list
4.1.1 Modeling for the questions (cont.)
4.1.1 Modeling for the questions (cont.)
 Thinking further along those lines, you can come up
with the following questions:
1. Whom does TheFakeMT follow?
2. Does TheFakeMT follow TheRealMT?
3. Who follows TheFakeMT?
4. Does TheRealMT follow TheFakeMT?
4.1.2 Defining requirements: more work up front
always pays
 From the perspective of TwitBase, you expect data to
be written to HBase when the following things
happen:
 A user follows someone
 A user unfollows someone they were following
4.1.2 Defining requirements: more work up front
always pays (cont.)
4.1.2 Defining requirements: more work up front
always pays (cont.)
 What is different from design tables in relational
systems and tables in HBase?
4.1.3 Modeling for even distribution of data and
load
4.1.3 Modeling for even distribution of data and
load (cont.)
4.1.4 Targeted data access
 Only the keys are indexed in HBase tables.
 There are two ways to retrieve data from a table: Get and
Scan.
 HBase tables are flexible, and you can store anything in
the form of byte[].
 Store everything with similar access patterns in the same
column family.
 Indexing is done on the Key portion of the KeyValue
objects, consisting of the rowkey, qualifier, and
timestamp in that order.
 Tall tables can potentially allow you to move toward O(1)
operations, but you trade atomicity
4.1.4 Targeted data access (cont.)
 De-normalizing is the way to go when designing HBase
schemas.
 Think how you can accomplish your access patterns in
single API calls rather than multiple API calls.
 Hashing allows for fixed-length keys and better
distribution but takes away ordering.
 Column qualifiers can be used to store data, just like
cells.
 The length of column qualifiers impacts the storage
footprint because you can put data in them.
 The length of the column family name impacts the size of
data sent over the wire to the client (in KeyValue
objects).
4.2 De-normalization is the word in HBase land
 One of the key concepts when designing HBase
tables is de-normalization.
4.3 Heterogeneous data in the same table
 HBase schemas are flexible, and you’ll use that
flexibility now to avoid doing scans every time you
want a list of followers for a given user.
 Isolate different access patterns as much as possible.
 The way to improve the load distribution in this case
is to have separate tables for the two types of
relationships you want to store.
4.4 Rowkey design strategies
 In designing HBase tables, the rowkey is the single
most important thing.
 Your rowkeys determine the performance you get
while interacting with HBase tables.
 Unlike relational databases, where you can index on
multiple columns, Hbase indexes only on the key;
4.5 I/O considerations
 The sorted nature of HBase tables can turn out to be
a great thing for your application—or not
 Optimized for writes
 HASHING
 SALTING
 Optimized for reads
 Cardinality and rowkey structure
4.6 From relational to non-relational
 There is no simple way to map your relational
database knowledge to HBase. It’s a different
paradigm of thinking
 Things don’t necessarily map 1:1, and these concepts
are evolving and being defined as the adoption of
NoSQL systems increases.
4.6.1 Some basic concepts
 ENTITIES
 These map to tables.
 In both relational databases and HBase, the default container
for an entity is a table, and each row in the table should
represent one instance of that entity.
 ATTRIBUTES
 These map to columns.
 Identifying attribute: This is the attribute that uniquely
identifies exactly one instance of an entity (that is, one row).
 Non-identifying attribute: Non-identifying attributes are
easier to map.
4.6.1 Some basic concepts (cont.)
 RELATIONSHIPS
 These map to foreign-key relationships.
 There is no direct mapping of these in HBase, and often it
comes down to denormalizing the data.
 HBase, not having any built-in joins or constraints, has little
use for explicit relationships.
4.6.2 Nested entities
 In Hbase, the columns (also known as column
qualifiers) aren’t predefined at design time.
4.6.2 Nested entities (cont.)
 it’s possible to model it in HBase
as a single row.
 There are some limitations to
this
 this technique only works to one
level deep: your nested entities can’t
themselves have nested entities.
 it’s not as efficient to access an
individual value stored as a nested
column qualifier inside a row
4.6.3 Some things don’t map
 COLUMN FAMILIES
 (LACK OF) INDEXES
 VERSIONING
4.7 Advanced column family configurations
 HBase has a few advanced features that you can use
when designing your tables.
 Configurable block size
 hbase(main):002:0> create 'mytable', {NAME => 'colfam1',
BLOCKSIZE => '65536'}
 Block cache
 hbase(main):002:0> create 'mytable', {NAME => 'colfam1',
BLOCKCACHE => 'false’}
 Aggressive caching
 hbase(main):002:0> create 'mytable', {NAME => 'colfam1',
IN_MEMORY => 'true'}
4.7 Advanced column family configurations
(cont.)
 Bloom filters
 hbase(main):007:0> create 'mytable', {NAME => 'colfam1',
BLOOMFILTER => 'ROWCOL'}
 The default value for the BLOOMFILTER parameter is NONE.
 TTL
 hbase(main):002:0> create 'mytable', {NAME => 'colfam1', TTL
=> '18000'}
 Compression
 hbase(main):002:0> create 'mytable', {NAME => 'colfam1',
COMPRESSION => 'SNAPPY'}
 Cell versioning
 hbase(main):002:0> create 'mytable', {NAME => 'colfam1',
VERSIONS => 1}
4.8 Filtering data
 Filters are a powerful feature that can come in handy
in such cases.
 HBase provides an API you can use to implement
custom filters.
4.8.1 Implementing a filter
 Implement custom filter by extending FilterBase
abstract class
 The filtering logic goes in the filterKeyValue(..) method
 To install custom filters
 have to compile them into a JAR and put them in the HBase
classpath so they get picked up by the RegionServers at startup
time.
 To compile the JAR, in the top-level directory of the project, do
the following:
 mvn install
 cp target/twitbase-1.0.0.jar /my/folder/
4.8.2 Prebundled filters
 ROWFILTER
 PREFIXFILTER
 QUALIFIERFILTER
 VALUEFILTER
 TIMESTAMPFILTER
 FILTERLIST
Hbase Course
 Data Manipulation at Scale: Systems and
Algorithms
 Using HBase for Real-time Access to Your Big
Data
4.9 Summary
 It’s about the questions, not the relationships.
 Design is never finished.
 Scale is a first-class entity.
 Every dimension is an opportunity.

More Related Content

What's hot

Optimization on Key-value Stores in Cloud Environment
Optimization on Key-value Stores in Cloud EnvironmentOptimization on Key-value Stores in Cloud Environment
Optimization on Key-value Stores in Cloud Environment
Fei Dong
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
Nisheet Mahajan
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
enissoz
 
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLAdding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Piotr Pruski
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
David Groozman
 
How Impala Works
How Impala WorksHow Impala Works
How Impala Works
Yue Chen
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
Anuja Gunale
 
Compaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache AccumuloCompaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache Accumulo
Hortonworks
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBaseCon
 
HBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupHBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User Group
Cloudera, Inc.
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
HBaseCon
 
Hadoop DB
Hadoop DBHadoop DB
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
HBaseCon
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab
Cynthia Saracco
 
Presentation day2 oracle12c
Presentation day2 oracle12cPresentation day2 oracle12c
Presentation day2 oracle12c
Pradeep Srivastava
 
Presentationday3oracle12c
Presentationday3oracle12cPresentationday3oracle12c
Presentationday3oracle12c
Pradeep Srivastava
 
HadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endHadoopDB a major step towards a dead end
HadoopDB a major step towards a dead end
thkoch
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
Jared Winick
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
huguk
 

What's hot (20)

Optimization on Key-value Stores in Cloud Environment
Optimization on Key-value Stores in Cloud EnvironmentOptimization on Key-value Stores in Cloud Environment
Optimization on Key-value Stores in Cloud Environment
 
Data Storage Management
Data Storage ManagementData Storage Management
Data Storage Management
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
HBase Read High Availability Using Timeline Consistent Region Replicas
HBase  Read High Availability Using Timeline Consistent Region ReplicasHBase  Read High Availability Using Timeline Consistent Region Replicas
HBase Read High Availability Using Timeline Consistent Region Replicas
 
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQLAdding Value to HBase with IBM InfoSphere BigInsights and BigSQL
Adding Value to HBase with IBM InfoSphere BigInsights and BigSQL
 
Cloudera Impala Internals
Cloudera Impala InternalsCloudera Impala Internals
Cloudera Impala Internals
 
How Impala Works
How Impala WorksHow Impala Works
How Impala Works
 
4. hbase overview
4. hbase overview4. hbase overview
4. hbase overview
 
Compaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache AccumuloCompaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache Accumulo
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
HBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User GroupHBase and Accumulo | Washington DC Hadoop User Group
HBase and Accumulo | Washington DC Hadoop User Group
 
HBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDKHBase Data Modeling and Access Patterns with Kite SDK
HBase Data Modeling and Access Patterns with Kite SDK
 
Hadoop DB
Hadoop DBHadoop DB
Hadoop DB
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
Big Data: HBase and Big SQL self-study lab
Big Data:  HBase and Big SQL self-study lab Big Data:  HBase and Big SQL self-study lab
Big Data: HBase and Big SQL self-study lab
 
Presentation day2 oracle12c
Presentation day2 oracle12cPresentation day2 oracle12c
Presentation day2 oracle12c
 
Presentationday3oracle12c
Presentationday3oracle12cPresentationday3oracle12c
Presentationday3oracle12c
 
HadoopDB a major step towards a dead end
HadoopDB a major step towards a dead endHadoopDB a major step towards a dead end
HadoopDB a major step towards a dead end
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 

Viewers also liked

HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
HBaseCon
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
alexbaranau
 
HBASE Overview
HBASE OverviewHBASE Overview
HBASE Overview
Sampath Rachakonda
 
TriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in ProductionTriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in Production
trihug
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
HBaseCon
 
Hbase at Salesforce.com
Hbase at Salesforce.comHbase at Salesforce.com
Hbase at Salesforce.com
Salesforce Engineering
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
JAX London
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
Carol McDonald
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
Cloudera, Inc.
 
Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?
Daniel Abadi
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
larsgeorge
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
Venu Anuganti
 
[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview
Stratio
 
Intro to column stores
Intro to column storesIntro to column stores
Intro to column stores
Justin Swanhart
 
HBase schema design Big Data TechCon Boston
HBase schema design Big Data TechCon BostonHBase schema design Big Data TechCon Boston
HBase schema design Big Data TechCon Boston
amansk
 
Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory
HBaseCon
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
ArangoDB Database
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
HBaseCon
 
Apache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at XiaomiApache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at Xiaomi
HBaseCon
 
Design Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and Kiji
HBaseCon
 

Viewers also liked (20)

HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
 
HBASE Overview
HBASE OverviewHBASE Overview
HBASE Overview
 
TriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in ProductionTriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in Production
 
Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase Update on OpenTSDB and AsyncHBase
Update on OpenTSDB and AsyncHBase
 
Hbase at Salesforce.com
Hbase at Salesforce.comHbase at Salesforce.com
Hbase at Salesforce.com
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Hadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema DesignHadoop World 2011: Advanced HBase Schema Design
Hadoop World 2011: Advanced HBase Schema Design
 
Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?Column-Stores vs. Row-Stores: How Different are they Really?
Column-Stores vs. Row-Stores: How Different are they Really?
 
HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview[Spark meetup] Spark Streaming Overview
[Spark meetup] Spark Streaming Overview
 
Intro to column stores
Intro to column storesIntro to column stores
Intro to column stores
 
HBase schema design Big Data TechCon Boston
HBase schema design Big Data TechCon BostonHBase schema design Big Data TechCon Boston
HBase schema design Big Data TechCon Boston
 
Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory Breaking the Sound Barrier with Persistent Memory
Breaking the Sound Barrier with Persistent Memory
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
 
Keynote: The Future of Apache HBase
Keynote: The Future of Apache HBaseKeynote: The Future of Apache HBase
Keynote: The Future of Apache HBase
 
Apache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at XiaomiApache HBase Improvements and Practices at Xiaomi
Apache HBase Improvements and Practices at Xiaomi
 
Design Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and Kiji
 

Similar to HBase In Action - Chapter 04: HBase table design

HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
Sadhik7
 
Hbase
HbaseHbase
Hbase
HbaseHbase
Hbase
Vetri V
 
Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7
Rohit Agrawal
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
CloudxLab
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
Sitamarhi Institute of Technology
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
Sitamarhi Institute of Technology
 
rhbase_tutorial
rhbase_tutorialrhbase_tutorial
rhbase_tutorial
Aaron Benz
 
Hbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsHbase Quick Review Guide for Interviews
Hbase Quick Review Guide for Interviews
Ravindra kumar
 
Apache h base
Apache h baseApache h base
Apache h base
Ramakrishna kapa
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
KrishnaVeni451953
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
Anil Gupta
 
Big data hbase
Big data hbase Big data hbase
Big data hbase
ANSHUL GUPTA
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
MapR Technologies
 
MyLife with HBase or HBase three flavors
MyLife with HBase or HBase three flavorsMyLife with HBase or HBase three flavors
MyLife with HBase or HBase three flavors
responseteam
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
Byeongweon Moon
 
Data Storage and Management project Report
Data Storage and Management project ReportData Storage and Management project Report
Data Storage and Management project Report
Tushar Dalvi
 
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
IndicThreads
 

Similar to HBase In Action - Chapter 04: HBase table design (20)

HBase.pptx
HBase.pptxHBase.pptx
HBase.pptx
 
Hbase
HbaseHbase
Hbase
 
Hbase
HbaseHbase
Hbase
 
Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7Advance Hive, NoSQL Database (HBase) - Module 7
Advance Hive, NoSQL Database (HBase) - Module 7
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLabIntroduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
Introduction to HBase | Big Data Hadoop Spark Tutorial | CloudxLab
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
 
Uint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdfUint-5 Big data Frameworks.pdf
Uint-5 Big data Frameworks.pdf
 
rhbase_tutorial
rhbase_tutorialrhbase_tutorial
rhbase_tutorial
 
Hbase Quick Review Guide for Interviews
Hbase Quick Review Guide for InterviewsHbase Quick Review Guide for Interviews
Hbase Quick Review Guide for Interviews
 
Apache h base
Apache h baseApache h base
Apache h base
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPERCCS334 BIG DATA ANALYTICS UNIT 5 PPT  ELECTIVE PAPER
CCS334 BIG DATA ANALYTICS UNIT 5 PPT ELECTIVE PAPER
 
Introduction To HBase
Introduction To HBaseIntroduction To HBase
Introduction To HBase
 
Big data hbase
Big data hbase Big data hbase
Big data hbase
 
Introduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and SecurityIntroduction to Apache HBase, MapR Tables and Security
Introduction to Apache HBase, MapR Tables and Security
 
MyLife with HBase or HBase three flavors
MyLife with HBase or HBase three flavorsMyLife with HBase or HBase three flavors
MyLife with HBase or HBase three flavors
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Data Storage and Management project Report
Data Storage and Management project ReportData Storage and Management project Report
Data Storage and Management project Report
 
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
Breaking with relational DBMS and dating with Hbase [5th IndicThreads.com Con...
 

More from phanleson

Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
phanleson
 
Firewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth FirewallsFirewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth Firewalls
phanleson
 
Mobile Security - Wireless hacking
Mobile Security - Wireless hackingMobile Security - Wireless hacking
Mobile Security - Wireless hacking
phanleson
 
Authentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless ProtocolsAuthentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless Protocols
phanleson
 
E-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server AttacksE-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server Attacks
phanleson
 
Hacking web applications
Hacking web applicationsHacking web applications
Hacking web applications
phanleson
 
Learning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlibLearning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlib
phanleson
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
phanleson
 
Learning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQLLearning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQL
phanleson
 
Learning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a ClusterLearning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a Cluster
phanleson
 
Learning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark ProgrammingLearning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark Programming
phanleson
 
Learning spark ch05 - Loading and Saving Your Data
Learning spark ch05 - Loading and Saving Your DataLearning spark ch05 - Loading and Saving Your Data
Learning spark ch05 - Loading and Saving Your Data
phanleson
 
Learning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value PairsLearning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value Pairs
phanleson
 
Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
phanleson
 
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about LibertagiaHướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
phanleson
 
Lecture 1 - Getting to know XML
Lecture 1 - Getting to know XMLLecture 1 - Getting to know XML
Lecture 1 - Getting to know XML
phanleson
 
Lecture 4 - Adding XTHML for the Web
Lecture  4 - Adding XTHML for the WebLecture  4 - Adding XTHML for the Web
Lecture 4 - Adding XTHML for the Web
phanleson
 
Lecture 2 - Using XML for Many Purposes
Lecture 2 - Using XML for Many PurposesLecture 2 - Using XML for Many Purposes
Lecture 2 - Using XML for Many Purposes
phanleson
 
SOA Course - SOA governance - Lecture 19
SOA Course - SOA governance - Lecture 19SOA Course - SOA governance - Lecture 19
SOA Course - SOA governance - Lecture 19
phanleson
 
Lecture 18 - Model-Driven Service Development
Lecture 18 - Model-Driven Service DevelopmentLecture 18 - Model-Driven Service Development
Lecture 18 - Model-Driven Service Development
phanleson
 

More from phanleson (20)

Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
 
Firewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth FirewallsFirewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth Firewalls
 
Mobile Security - Wireless hacking
Mobile Security - Wireless hackingMobile Security - Wireless hacking
Mobile Security - Wireless hacking
 
Authentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless ProtocolsAuthentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless Protocols
 
E-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server AttacksE-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server Attacks
 
Hacking web applications
Hacking web applicationsHacking web applications
Hacking web applications
 
Learning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlibLearning spark ch11 - Machine Learning with MLlib
Learning spark ch11 - Machine Learning with MLlib
 
Learning spark ch10 - Spark Streaming
Learning spark ch10 - Spark StreamingLearning spark ch10 - Spark Streaming
Learning spark ch10 - Spark Streaming
 
Learning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQLLearning spark ch09 - Spark SQL
Learning spark ch09 - Spark SQL
 
Learning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a ClusterLearning spark ch07 - Running on a Cluster
Learning spark ch07 - Running on a Cluster
 
Learning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark ProgrammingLearning spark ch06 - Advanced Spark Programming
Learning spark ch06 - Advanced Spark Programming
 
Learning spark ch05 - Loading and Saving Your Data
Learning spark ch05 - Loading and Saving Your DataLearning spark ch05 - Loading and Saving Your Data
Learning spark ch05 - Loading and Saving Your Data
 
Learning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value PairsLearning spark ch04 - Working with Key/Value Pairs
Learning spark ch04 - Working with Key/Value Pairs
 
Learning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with SparkLearning spark ch01 - Introduction to Data Analysis with Spark
Learning spark ch01 - Introduction to Data Analysis with Spark
 
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about LibertagiaHướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
 
Lecture 1 - Getting to know XML
Lecture 1 - Getting to know XMLLecture 1 - Getting to know XML
Lecture 1 - Getting to know XML
 
Lecture 4 - Adding XTHML for the Web
Lecture  4 - Adding XTHML for the WebLecture  4 - Adding XTHML for the Web
Lecture 4 - Adding XTHML for the Web
 
Lecture 2 - Using XML for Many Purposes
Lecture 2 - Using XML for Many PurposesLecture 2 - Using XML for Many Purposes
Lecture 2 - Using XML for Many Purposes
 
SOA Course - SOA governance - Lecture 19
SOA Course - SOA governance - Lecture 19SOA Course - SOA governance - Lecture 19
SOA Course - SOA governance - Lecture 19
 
Lecture 18 - Model-Driven Service Development
Lecture 18 - Model-Driven Service DevelopmentLecture 18 - Model-Driven Service Development
Lecture 18 - Model-Driven Service Development
 

Recently uploaded

SD_Instructional-Design-Frameworkzz.pptx
SD_Instructional-Design-Frameworkzz.pptxSD_Instructional-Design-Frameworkzz.pptx
SD_Instructional-Design-Frameworkzz.pptx
MarkKennethBellen1
 
QCE – Unpacking the syllabus Implications for Senior School practices and ass...
QCE – Unpacking the syllabus Implications for Senior School practices and ass...QCE – Unpacking the syllabus Implications for Senior School practices and ass...
QCE – Unpacking the syllabus Implications for Senior School practices and ass...
mansk2
 
Microservices Interview Questions and Answers PDF By ScholarHat
Microservices Interview Questions and Answers PDF By ScholarHatMicroservices Interview Questions and Answers PDF By ScholarHat
Microservices Interview Questions and Answers PDF By ScholarHat
Scholarhat
 
PRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdf
PRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdfPRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdf
PRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdf
nservice241
 
How to Create an XLS Report in Odoo 17 - Odoo 17 Slides
How to Create an XLS Report in Odoo 17 - Odoo 17 SlidesHow to Create an XLS Report in Odoo 17 - Odoo 17 Slides
How to Create an XLS Report in Odoo 17 - Odoo 17 Slides
Celine George
 
VRS An Strategic Approch to Meet Need of Organisation.pptx
VRS An Strategic Approch to Meet Need of Organisation.pptxVRS An Strategic Approch to Meet Need of Organisation.pptx
VRS An Strategic Approch to Meet Need of Organisation.pptx
Banker and Adjunct Lecturer
 
matatag classroom orientation school year 2024-2025
matatag classroom orientation school year 2024-2025matatag classroom orientation school year 2024-2025
matatag classroom orientation school year 2024-2025
florrizabombio
 
FIRST AID PRESENTATION ON INDUSTRIAL SAFETY by dr lal.ppt
FIRST AID PRESENTATION ON INDUSTRIAL SAFETY by dr lal.pptFIRST AID PRESENTATION ON INDUSTRIAL SAFETY by dr lal.ppt
FIRST AID PRESENTATION ON INDUSTRIAL SAFETY by dr lal.ppt
ashutoshklal29
 
How to Configure Field Cleaning Rules in Odoo 17
How to Configure Field Cleaning Rules in Odoo 17How to Configure Field Cleaning Rules in Odoo 17
How to Configure Field Cleaning Rules in Odoo 17
Celine George
 
How to Make a Field Storable in Odoo 17 - Odoo Slides
How to Make a Field Storable in Odoo 17 - Odoo SlidesHow to Make a Field Storable in Odoo 17 - Odoo Slides
How to Make a Field Storable in Odoo 17 - Odoo Slides
Celine George
 
Email Marketing in Odoo 17 - Odoo 17 Slides
Email Marketing  in Odoo 17 - Odoo 17 SlidesEmail Marketing  in Odoo 17 - Odoo 17 Slides
Email Marketing in Odoo 17 - Odoo 17 Slides
Celine George
 
A history of Innisfree in Milanville, Pennsylvania
A history of Innisfree in Milanville, PennsylvaniaA history of Innisfree in Milanville, Pennsylvania
A history of Innisfree in Milanville, Pennsylvania
ThomasRue2
 
Parkinson Disease & Anti-Parkinsonian Drugs.pptx
Parkinson Disease & Anti-Parkinsonian Drugs.pptxParkinson Disease & Anti-Parkinsonian Drugs.pptx
Parkinson Disease & Anti-Parkinsonian Drugs.pptx
AnujVishwakarma34
 
Production Technology of Mango in Nepal.pptx
Production Technology of Mango in Nepal.pptxProduction Technology of Mango in Nepal.pptx
Production Technology of Mango in Nepal.pptx
UmeshTimilsina1
 
Class 6 English Chapter 1 Fables and Folk Stories
Class 6 English Chapter 1 Fables and Folk StoriesClass 6 English Chapter 1 Fables and Folk Stories
Class 6 English Chapter 1 Fables and Folk Stories
sweetygupta8413
 
Dreams Realised by mahadev desai 9 1.pptx
Dreams Realised by mahadev desai 9 1.pptxDreams Realised by mahadev desai 9 1.pptx
Dreams Realised by mahadev desai 9 1.pptx
AncyTEnglish
 
Java Full Stack Developer Interview Questions PDF By ScholarHat
Java Full Stack Developer Interview Questions PDF By ScholarHatJava Full Stack Developer Interview Questions PDF By ScholarHat
Java Full Stack Developer Interview Questions PDF By ScholarHat
Scholarhat
 
Celebrating 25th Year SATURDAY, 27th JULY, 2024
Celebrating 25th Year SATURDAY, 27th JULY, 2024Celebrating 25th Year SATURDAY, 27th JULY, 2024
Celebrating 25th Year SATURDAY, 27th JULY, 2024
APEC Melmaruvathur
 
FINAL MATATAG PE and Health CG 2023 Grades 4-10.pdf
FINAL MATATAG PE and Health CG 2023 Grades 4-10.pdfFINAL MATATAG PE and Health CG 2023 Grades 4-10.pdf
FINAL MATATAG PE and Health CG 2023 Grades 4-10.pdf
HayddieMaeCapunong
 

Recently uploaded (20)

SD_Instructional-Design-Frameworkzz.pptx
SD_Instructional-Design-Frameworkzz.pptxSD_Instructional-Design-Frameworkzz.pptx
SD_Instructional-Design-Frameworkzz.pptx
 
QCE – Unpacking the syllabus Implications for Senior School practices and ass...
QCE – Unpacking the syllabus Implications for Senior School practices and ass...QCE – Unpacking the syllabus Implications for Senior School practices and ass...
QCE – Unpacking the syllabus Implications for Senior School practices and ass...
 
Microservices Interview Questions and Answers PDF By ScholarHat
Microservices Interview Questions and Answers PDF By ScholarHatMicroservices Interview Questions and Answers PDF By ScholarHat
Microservices Interview Questions and Answers PDF By ScholarHat
 
PRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdf
PRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdfPRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdf
PRESS RELEASE - UNIVERSITY OF GHANA, JULY 16, 2024.pdf
 
How to Create an XLS Report in Odoo 17 - Odoo 17 Slides
How to Create an XLS Report in Odoo 17 - Odoo 17 SlidesHow to Create an XLS Report in Odoo 17 - Odoo 17 Slides
How to Create an XLS Report in Odoo 17 - Odoo 17 Slides
 
VRS An Strategic Approch to Meet Need of Organisation.pptx
VRS An Strategic Approch to Meet Need of Organisation.pptxVRS An Strategic Approch to Meet Need of Organisation.pptx
VRS An Strategic Approch to Meet Need of Organisation.pptx
 
matatag classroom orientation school year 2024-2025
matatag classroom orientation school year 2024-2025matatag classroom orientation school year 2024-2025
matatag classroom orientation school year 2024-2025
 
FIRST AID PRESENTATION ON INDUSTRIAL SAFETY by dr lal.ppt
FIRST AID PRESENTATION ON INDUSTRIAL SAFETY by dr lal.pptFIRST AID PRESENTATION ON INDUSTRIAL SAFETY by dr lal.ppt
FIRST AID PRESENTATION ON INDUSTRIAL SAFETY by dr lal.ppt
 
How to Configure Field Cleaning Rules in Odoo 17
How to Configure Field Cleaning Rules in Odoo 17How to Configure Field Cleaning Rules in Odoo 17
How to Configure Field Cleaning Rules in Odoo 17
 
How to Make a Field Storable in Odoo 17 - Odoo Slides
How to Make a Field Storable in Odoo 17 - Odoo SlidesHow to Make a Field Storable in Odoo 17 - Odoo Slides
How to Make a Field Storable in Odoo 17 - Odoo Slides
 
Email Marketing in Odoo 17 - Odoo 17 Slides
Email Marketing  in Odoo 17 - Odoo 17 SlidesEmail Marketing  in Odoo 17 - Odoo 17 Slides
Email Marketing in Odoo 17 - Odoo 17 Slides
 
A history of Innisfree in Milanville, Pennsylvania
A history of Innisfree in Milanville, PennsylvaniaA history of Innisfree in Milanville, Pennsylvania
A history of Innisfree in Milanville, Pennsylvania
 
Parkinson Disease & Anti-Parkinsonian Drugs.pptx
Parkinson Disease & Anti-Parkinsonian Drugs.pptxParkinson Disease & Anti-Parkinsonian Drugs.pptx
Parkinson Disease & Anti-Parkinsonian Drugs.pptx
 
Production Technology of Mango in Nepal.pptx
Production Technology of Mango in Nepal.pptxProduction Technology of Mango in Nepal.pptx
Production Technology of Mango in Nepal.pptx
 
UM “ATÉ JÁ” ANIMADO! . .
UM “ATÉ JÁ” ANIMADO!                        .            .UM “ATÉ JÁ” ANIMADO!                        .            .
UM “ATÉ JÁ” ANIMADO! . .
 
Class 6 English Chapter 1 Fables and Folk Stories
Class 6 English Chapter 1 Fables and Folk StoriesClass 6 English Chapter 1 Fables and Folk Stories
Class 6 English Chapter 1 Fables and Folk Stories
 
Dreams Realised by mahadev desai 9 1.pptx
Dreams Realised by mahadev desai 9 1.pptxDreams Realised by mahadev desai 9 1.pptx
Dreams Realised by mahadev desai 9 1.pptx
 
Java Full Stack Developer Interview Questions PDF By ScholarHat
Java Full Stack Developer Interview Questions PDF By ScholarHatJava Full Stack Developer Interview Questions PDF By ScholarHat
Java Full Stack Developer Interview Questions PDF By ScholarHat
 
Celebrating 25th Year SATURDAY, 27th JULY, 2024
Celebrating 25th Year SATURDAY, 27th JULY, 2024Celebrating 25th Year SATURDAY, 27th JULY, 2024
Celebrating 25th Year SATURDAY, 27th JULY, 2024
 
FINAL MATATAG PE and Health CG 2023 Grades 4-10.pdf
FINAL MATATAG PE and Health CG 2023 Grades 4-10.pdfFINAL MATATAG PE and Health CG 2023 Grades 4-10.pdf
FINAL MATATAG PE and Health CG 2023 Grades 4-10.pdf
 

HBase In Action - Chapter 04: HBase table design

  • 1. CHAPTER 04: HBASE TABLE DESIGN HBase IN ACTION by Nick Dimiduk et. al.
  • 2. Overview: HBase table design  HBase schema design concepts  Mapping relational modeling knowledge to the HBase world  Advanced table definition parameters  HBase Filters to optimize read performance
  • 3. 4.1 How to approach schema design  When we say schema, we include the following considerations:  How many column families should the table have?  What data goes into what column family?  How many columns should be in each column family?  What should the column names be?  What information should go into the cells?  How many versions should be stored for each cell?  What should the rowkey structure be, and what should it contain?
  • 4. Hbase Course  Data Manipulation at Scale: Systems and Algorithms  Using HBase for Real-time Access to Your Big Data
  • 5. 4.1.1 Modeling for the questions  A table store data about what users a particular user follows, support  read the entire list of users,  and query for the presence of a specific user in that list
  • 6. 4.1.1 Modeling for the questions (cont.)
  • 7. 4.1.1 Modeling for the questions (cont.)  Thinking further along those lines, you can come up with the following questions: 1. Whom does TheFakeMT follow? 2. Does TheFakeMT follow TheRealMT? 3. Who follows TheFakeMT? 4. Does TheRealMT follow TheFakeMT?
  • 8. 4.1.2 Defining requirements: more work up front always pays  From the perspective of TwitBase, you expect data to be written to HBase when the following things happen:  A user follows someone  A user unfollows someone they were following
  • 9. 4.1.2 Defining requirements: more work up front always pays (cont.)
  • 10. 4.1.2 Defining requirements: more work up front always pays (cont.)  What is different from design tables in relational systems and tables in HBase?
  • 11. 4.1.3 Modeling for even distribution of data and load
  • 12. 4.1.3 Modeling for even distribution of data and load (cont.)
  • 13. 4.1.4 Targeted data access  Only the keys are indexed in HBase tables.  There are two ways to retrieve data from a table: Get and Scan.  HBase tables are flexible, and you can store anything in the form of byte[].  Store everything with similar access patterns in the same column family.  Indexing is done on the Key portion of the KeyValue objects, consisting of the rowkey, qualifier, and timestamp in that order.  Tall tables can potentially allow you to move toward O(1) operations, but you trade atomicity
  • 14. 4.1.4 Targeted data access (cont.)  De-normalizing is the way to go when designing HBase schemas.  Think how you can accomplish your access patterns in single API calls rather than multiple API calls.  Hashing allows for fixed-length keys and better distribution but takes away ordering.  Column qualifiers can be used to store data, just like cells.  The length of column qualifiers impacts the storage footprint because you can put data in them.  The length of the column family name impacts the size of data sent over the wire to the client (in KeyValue objects).
  • 15. 4.2 De-normalization is the word in HBase land  One of the key concepts when designing HBase tables is de-normalization.
  • 16. 4.3 Heterogeneous data in the same table  HBase schemas are flexible, and you’ll use that flexibility now to avoid doing scans every time you want a list of followers for a given user.  Isolate different access patterns as much as possible.  The way to improve the load distribution in this case is to have separate tables for the two types of relationships you want to store.
  • 17. 4.4 Rowkey design strategies  In designing HBase tables, the rowkey is the single most important thing.  Your rowkeys determine the performance you get while interacting with HBase tables.  Unlike relational databases, where you can index on multiple columns, Hbase indexes only on the key;
  • 18. 4.5 I/O considerations  The sorted nature of HBase tables can turn out to be a great thing for your application—or not  Optimized for writes  HASHING  SALTING  Optimized for reads  Cardinality and rowkey structure
  • 19. 4.6 From relational to non-relational  There is no simple way to map your relational database knowledge to HBase. It’s a different paradigm of thinking  Things don’t necessarily map 1:1, and these concepts are evolving and being defined as the adoption of NoSQL systems increases.
  • 20. 4.6.1 Some basic concepts  ENTITIES  These map to tables.  In both relational databases and HBase, the default container for an entity is a table, and each row in the table should represent one instance of that entity.  ATTRIBUTES  These map to columns.  Identifying attribute: This is the attribute that uniquely identifies exactly one instance of an entity (that is, one row).  Non-identifying attribute: Non-identifying attributes are easier to map.
  • 21. 4.6.1 Some basic concepts (cont.)  RELATIONSHIPS  These map to foreign-key relationships.  There is no direct mapping of these in HBase, and often it comes down to denormalizing the data.  HBase, not having any built-in joins or constraints, has little use for explicit relationships.
  • 22. 4.6.2 Nested entities  In Hbase, the columns (also known as column qualifiers) aren’t predefined at design time.
  • 23. 4.6.2 Nested entities (cont.)  it’s possible to model it in HBase as a single row.  There are some limitations to this  this technique only works to one level deep: your nested entities can’t themselves have nested entities.  it’s not as efficient to access an individual value stored as a nested column qualifier inside a row
  • 24. 4.6.3 Some things don’t map  COLUMN FAMILIES  (LACK OF) INDEXES  VERSIONING
  • 25. 4.7 Advanced column family configurations  HBase has a few advanced features that you can use when designing your tables.  Configurable block size  hbase(main):002:0> create 'mytable', {NAME => 'colfam1', BLOCKSIZE => '65536'}  Block cache  hbase(main):002:0> create 'mytable', {NAME => 'colfam1', BLOCKCACHE => 'false’}  Aggressive caching  hbase(main):002:0> create 'mytable', {NAME => 'colfam1', IN_MEMORY => 'true'}
  • 26. 4.7 Advanced column family configurations (cont.)  Bloom filters  hbase(main):007:0> create 'mytable', {NAME => 'colfam1', BLOOMFILTER => 'ROWCOL'}  The default value for the BLOOMFILTER parameter is NONE.  TTL  hbase(main):002:0> create 'mytable', {NAME => 'colfam1', TTL => '18000'}  Compression  hbase(main):002:0> create 'mytable', {NAME => 'colfam1', COMPRESSION => 'SNAPPY'}  Cell versioning  hbase(main):002:0> create 'mytable', {NAME => 'colfam1', VERSIONS => 1}
  • 27. 4.8 Filtering data  Filters are a powerful feature that can come in handy in such cases.  HBase provides an API you can use to implement custom filters.
  • 28. 4.8.1 Implementing a filter  Implement custom filter by extending FilterBase abstract class  The filtering logic goes in the filterKeyValue(..) method  To install custom filters  have to compile them into a JAR and put them in the HBase classpath so they get picked up by the RegionServers at startup time.  To compile the JAR, in the top-level directory of the project, do the following:  mvn install  cp target/twitbase-1.0.0.jar /my/folder/
  • 29. 4.8.2 Prebundled filters  ROWFILTER  PREFIXFILTER  QUALIFIERFILTER  VALUEFILTER  TIMESTAMPFILTER  FILTERLIST
  • 30. Hbase Course  Data Manipulation at Scale: Systems and Algorithms  Using HBase for Real-time Access to Your Big Data
  • 31. 4.9 Summary  It’s about the questions, not the relationships.  Design is never finished.  Scale is a first-class entity.  Every dimension is an opportunity.

Editor's Notes

  1. http://ouo.io/uaiKO
  2. Spark is a “computational engine” that is responsible for scheduling, distributing, and monitoring applications consisting of many computational tasks across many worker machines, or a computing cluster. First, all libraries and higher- level components in the stack benefit from improvements at the lower layers. Second, the costs associated with running the stack are minimized, because instead of running 5–10 independent software systems, an organization needs to run only one. Finally, one of the largest advantages of tight integration is the ability to build appli‐ cations that seamlessly combine different processing models.
  3. Spark is a “computational engine” that is responsible for scheduling, distributing, and monitoring applications consisting of many computational tasks across many worker machines, or a computing cluster. First, all libraries and higher- level components in the stack benefit from improvements at the lower layers. Second, the costs associated with running the stack are minimized, because instead of running 5–10 independent software systems, an organization needs to run only one. Finally, one of the largest advantages of tight integration is the ability to build appli‐ cations that seamlessly combine different processing models.
  4. http://ouo.io/uaiKO