Social networks by their nature deal with large amounts of user-generated data that must be processed and presented in a time sensitive manner. Much more write intensive than previous generations of websites, social networks have been on the leading edge of non-relational persistence technology adoption. This talk presents how Germany's leading social networks Schuelervz, Studivz and Meinvz are incorporating Redis and Project Voldemort into their platform to run features like activity streams.
The document discusses several key factors for optimizing HBase performance including:
1. Reads and writes compete for disk, network, and thread resources so they can cause bottlenecks.
2. Memory allocation needs to balance space for memstores, block caching, and Java heap usage.
3. The write-ahead log can be a major bottleneck and increasing its size or number of logs can improve write performance.
4. Flushes and compactions need to be tuned to avoid premature flushes causing "compaction storms".
These are my slides for the 5 minute overview talk I gave during a recent workshop at the European Commission in Brussels, on the topic of "Big Data Skills in Europe".
Have a lot of data? Using or considering using Apache HBase (part of the Hadoop family) to store your data? Want to have your cake and eat it too? Phoenix is an open source project put out by Salesforce. Join us to learn how you can continue to use SQL, but get the raw speed of native HBase usage through Phoenix.
HBase Advanced Schema Design - Berlin Buzzwords - June 2012larsgeorge
While running a simple key/value based solution on HBase usually requires an equally simple schema, it is less trivial to operate a different application that has to insert thousands of records per second. This talk will address the architectural challenges when designing for either read or write performance imposed by HBase. It will include examples of real world use-cases and how they
http://berlinbuzzwords.de/sessions/advanced-hbase-schema-design
HBase Status Report - Hadoop Summit Europe 2014larsgeorge
This document provides a summary of new features and improvements in recent versions of Apache HBase, a distributed, scalable, big data store. It discusses major changes and enhancements in HBase 0.92+, 0.94+, and 0.96+, including new HFile formats, coprocessors, caching improvements, performance tuning, and more. The document is intended to bring readers up to date on the current state and capabilities of HBase.
Sept 17 2013 - THUG - HBase a Technical IntroductionAdam Muise
HBase Technical Introduction. This deck includes a description of memory design, write path, read path, some operational tidbits, SQL on HBase (Phoenix and Hive), as well as HOYA (HBase on YARN).
Near-realtime analytics with Kafka and HBasedave_revell
A presentation at OSCON 2012 by Nate Putnam and Dave Revell about Urban Airship's analytics stack. Features Kafka, HBase, and Urban Airship's own open source projects statshtable and datacube.
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
This talk is about showing the complexity in building a data pipeline in Hadoop, starting with the technology aspect, and the correlating to the skillsets of current Hadoop adopters.
Parquet is an open-source columnar storage format that provides an efficient data layout for analytical queries. Twitter uses Parquet to store logs and analytics data across multiple large Hadoop clusters, saving petabytes of storage and reducing query times by up to 66% by reading only needed columns. Parquet defines a language-independent file format that stores data by column rather than row to optimize analytical access patterns.
This document summarizes Facebook's use cases and architecture for integrating Apache Hive and HBase. It discusses loading data from Hive into HBase tables using INSERT statements, querying HBase tables from Hive using SELECT statements, and maintaining low latency access to dimension tables stored in HBase while performing analytics on fact data stored in Hive. The architecture involves writing a storage handler and SerDe to map between the two systems and executing Hive queries by generating MapReduce jobs that read from or write to HBase.
Realtime Analytics with Hadoop and HBaselarsgeorge
The document discusses realtime analytics using Hadoop and HBase. It begins by introducing the speaker and their experience. It then discusses moving from batch processing with Hadoop to more realtime needs, and how systems like HBase can help bridge that gap. Several designs are presented for using HBase and Hadoop together to enable both realtime and batch analytics on large datasets.
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks
This document provides an overview of Apache HBase and Apache Phoenix. It discusses how HBase is a scalable, non-relational database that can store large volumes of data across commodity servers. Phoenix provides a SQL interface for HBase, allowing users to interact with HBase data using familiar SQL queries and functions. The document outlines new features in Phoenix for HDP 2.2, including improved support for secondary indexes and basic window functions.
Slides from my talk at the Feb 2011 Seattle Tech Startups meeting. More info here (along with powerpoint slides): http://www.startupmonkeys.com/2011/02/scala-frugal-mechanic/
1. Social Networks and
the Richness of Data
Getting distributed Webservices
Done with NoSQL
Fabrizio Schmidt, Lars George
VZnet Netzwerke Ltd.
Mittwoch, 10. März 2010
5. Unique Challenges
• 16 Million Users
• 1 Billion Relationships
• 3 Billion Photos
• 150 TB Data
• 13 Million Messages per Day
• 17 Million Logins per Day
• 15 Billion Requests per Month
• 120 Million Emails per Week
Mittwoch, 10. März 2010
6. Old System - Phoenix
• LAMP
• Apache + PHP + APC (50 req/s)
• Sharded MySQL Multi-Master Setup
• Memcache with 1 TB+
Monolithic Single Service, Synchronous
Mittwoch, 10. März 2010
7. Old System - Phoenix
• 500+ Apache Frontends
• 60+ Memcaches
• 150+ MySQL Servers
Mittwoch, 10. März 2010
12. Phoenix - RabbitMQ
1. PHP Implementation of AMQP Client
Too slow!
2. PHP C - Extension (php-amqp http://code.google.com/p/php-amqp/)
Fast enough
3. IPC - AMQP Dispatcher C-Daemon
That‘s it! But not released so far
Mittwoch, 10. März 2010
15. Old Activity Stream
• Memcache only - no persistence
• Status updates only
• #fail on users with >1000 friends
• #fail on memcache restart
Mittwoch, 10. März 2010
16. Old Activity Stream
We cheated!
• Memcache only - no persistence
• Status updates only
• #fail on users with >1000 friends
• #fail on memcache restart
Mittwoch, 10. März 2010
17. Old Activity Stream
We cheated!
• Memcache only - no persistence
• Status updates only
• #fail on users with >1000 friends
• #fail on memcache restart
source: internet
Mittwoch, 10. März 2010
18. Social Network Problem
= Twitter Problem???
• >15 different Events
• Timelines
• Aggregation
• Filters
• Privacy
Mittwoch, 10. März 2010
20. Do the Math!
18M Events/day sent to ~150 friends
Mittwoch, 10. März 2010
21. Do the Math!
18M Events/day sent to ~150 friends
=> 2700M timeline inserts / day
Mittwoch, 10. März 2010
22. Do the Math!
18M Events/day sent to ~150 friends
=> 2700M timeline inserts / day
20% during peak hour
Mittwoch, 10. März 2010
23. Do the Math!
18M Events/day sent to ~150 friends
=> 2700M timeline inserts / day
20% during peak hour
=> 3.6M event inserts/hour - 1000/s
Mittwoch, 10. März 2010
24. Do the Math!
18M Events/day sent to ~150 friends
=> 2700M timeline inserts / day
20% during peak hour
=> 3.6M event inserts/hour - 1000/s
=> 540M timeline inserts/hour - 150000/s
Mittwoch, 10. März 2010
25. e inserts / day
r
erts/hour - 1000/s
inserts/hour - 150000/s
Mittwoch, 10. März 2010
26. New Activity Stream
• Social Network Problem
• Architecture
• NoSQL Systems
Mittwoch, 10. März 2010
27. New Activity Stream
Do it right!
• Social Network Problem
• Architecture
• NoSQL Systems
Mittwoch, 10. März 2010
28. New Activity Stream
Do it right!
• Social Network Problem
• Architecture
• NoSQL Systems
source: internet
Mittwoch, 10. März 2010
31. FAS
Federated Autonomous
Services
Mittwoch, 10. März 2010
32. Activity Stream
as a service
Requirements:
• Endless scalability
• Storage & cloud independent
• Fast
• Flexible & extensible data model
Mittwoch, 10. März 2010
37. NoSQL
Schema
Event Event is sent in by
piggybacking the request
Mittwoch, 10. März 2010
38. NoSQL
Schema
Generate ID
Event Generate itemID - unique ID
of the event
Mittwoch, 10. März 2010
39. NoSQL
Schema
Save Item
Generate ID
Event itemID => stream_entry - save
the event with meta information
Mittwoch, 10. März 2010
40. NoSQL
Insert into the timeline of each
Schema recipient
recipient → [[itemId, time, type],
…]
Save Item
Update Indexes
Generate ID
Event
Insert into the timeline of the
event originator
sender → [[itemId, time, type],
…]
Mittwoch, 10. März 2010
41. NoSQL
Schema
Save Item
Generate ID
Event
Mittwoch, 10. März 2010
44. Architecture: Push
Message Recipient Index (MRI)
Push the Message directly to all MRIs
➡ {number of Recipients ~150} updates
Special profiles and some users have >500 recipients
➡ >500 pushes to recipient timelines => stress the system!
Mittwoch, 10. März 2010
45. ORI
(Voldemort/
Redis)
Mittwoch, 10. März 2010
46. ORI
(Voldemort/
Redis)
Mittwoch, 10. März 2010
47. Architecture: Pull
Originator Index (ORI)
NO Push to MRIs at all
➡ 1 Message + 1 Originator Index Entry
Special profiles and some users have >500 friends
➡ get >500 ORIs on read => stress the system
Mittwoch, 10. März 2010
48. Architecture: PushPull
ORI + MRI
• Identify Users with recipient lists >{limit}
• Only push updates with recipients <{limit} to MRI
• Pull special profiles and users with >{limit} from ORI
• Identify active users with a bloom/bit filter for pull
Mittwoch, 10. März 2010
49. Lars
Activity Filter
• Reduce read operations on storage
• Distinguish user activity levels
• In memory and shared across keys and
types
• Scan full day of updates for16M users on a
per minute granularity for 1000 friends in <
100msecs
Mittwoch, 10. März 2010
52. NoSQL: Redis
ORI + MRI on Steroids
• Fast in memory Data-Structure Server
• Easy protocol
• Asynchronous Persistence
• Master-Slave Replication
• Virtual-Memory
• JRedis - The Java client
Mittwoch, 10. März 2010
53. NoSQL: Redis
ORI + MRI on Steroids
Data-Structure Server
• Datatypes: String, List, Sets, ZSets
• We use ZSets (sorted sets) for the Push Recipient Indexes
Insert
for (recipient : recipients) {
jredis.zadd(recipient.id, streamEntryIndex);
}
Get
jredis.zrange(streamOwnerId, from, to)
jredis.zrangebyscore(streamOwnerId, someScoreBegin, someScoreEnd)
Mittwoch, 10. März 2010
54. NoSQL: Redis
ORI + MRI on Steroids
Persistence - AOF and Bgsave
AOF - append only file
- append on operation
Bgsave - asynchronous snapshot
- configurable (timeperiod or every n operations)
- triggered directly
We use AOF as itʻs less memory hungry
Combined with bgsave for additional backups
Mittwoch, 10. März 2010
55. NoSQL: Redis
ORI + MRI on Steroids
Virtual - Memory
Storing Recipient Indexes for 16 mio users à ~500
entries would lead to >250 GB of RAM needed
With Virtual Memory activated Redis swaps less
frequented values to disk
➡ Only your hot dataset is in memory
➡ 40% logins per day / only 20% of these in peak
~ 20GB needed for hot dataset
Mittwoch, 10. März 2010
56. NoSQL: Redis
ORI + MRI on Steroids
Jredis - Redis java client
• Pipelining support (sync and async semantics)
• Redis 1.2.3 compliant
The missing parts
• No consistent hashing
• No rebalancing
Mittwoch, 10. März 2010
57. Message
Store
(Voldemort)
Mittwoch, 10. März 2010
58. Message
Store
(Voldemort)
Mittwoch, 10. März 2010
60. NoSQL:Voldemort
No #fail Messagestore (MS)
Configuring replication, reads and writes
<store>
<name>stream-ms</name>
<persistence>bdb</persistence>
<routing>client</routing>
<replication-factor>3</replication-factor>
<required-reads>2</required-reads>
<required-writes>2</required-writes>
<prefered-reads>3</prefered-reads>
<prefered-writes>3</prefered-writes>
<key-serializer><type>string</type></key-serializer>
<value-serializer><type>string</type></value-serializer>
<retention-days>8</retention-days>
</store>
Mittwoch, 10. März 2010
61. NoSQL:Voldemort
No #fail Messagestore (MS)
Write:
client.put(key, myVersionedValue);
Update(read-modify-write):
public class MriUpdateAction extends UpdateAction<String, String> {
public MriUpdateAction(String key, ItemIndex index) {
this.key = key;
this.index = index;
}
@Override
public void update(StoreClient<String, String> client) {
Versioned<String> versionedJson = client.get(this.key);
versionedJson.setObject("my value");
client.put(this.key, versionedJson);
}
}
Mittwoch, 10. März 2010
62. NoSQL:Voldemort
No #fail Messagestore (MS)
Eventual Consistency - Read
public MriInconsistencyResolver implements InconsistencyResolver<Versioned<String>> {
public List<Versioned<String>> resolveConflicts(List<Versioned<String>> items){
Versioned<String> vers0 = items.get(0);
Versioned<String> vers1 = items.get(1);
if(vers0 == null && vers1 == null) {
return null;
}
List<Versioned<String>> li = new ArrayList<Versioned<String>>(1);
if(vers0 == null) {
li.add(vers1);
return li;
}
if(vers1 == null) {
li.add(vers0);
return li;
}
// resolve your inconsistency here e.g. merge to lists
}
}
The default inconsistency resolver automatically takes the newer version
Mittwoch, 10. März 2010
63. NoSQL:Voldemort
No #fail Messagestore (MS)
Configuration
• Choose a big number of partitions
• Reduce the size of the BDB append log
• Balance Client and Server Threadpools
Mittwoch, 10. März 2010
64. Concurrent
ID
Generator
Mittwoch, 10. März 2010
65. Concurrent
ID
Generator
Mittwoch, 10. März 2010
66. NoSQL: Hazelcast
Concurrent ID Generator (CIG)
• In Memory Data Grid
• Dynamically Scales
• Distributed java.util.{Queue|Set|List|Map}
and more
• Dynamic Partitioning with Backups
• Configurable Eviction
• Persistence
Mittwoch, 10. März 2010
67. NoSQL: Hazelcast
Concurrent ID Generator (CIG)
Cluster-wide ID Generation:
• No UUID because of architecture constraint
• IDs of Stream Entries generated via
Hazelcast
• Replication to avoid loss of count
• Background persistence used for disaster
recovery
Mittwoch, 10. März 2010
68. NoSQL: Hazelcast
Concurrent ID Generator (CIG)
Generate Unique Sequencial Numbers (Distributed
Autoincrement):
• Nodes get ranges assigned (node1: 10000-19999,
node2: 20000-29999 ID's)
• IDs per range locally incremented on the
node (thread safe/atomic)
• Distributed locks secure range assignment for
nodes
Mittwoch, 10. März 2010
69. NoSQL: Hazelcast
Concurrent ID Generator (CIG)
Example Configuration
<map name=".vz.stream.CigHazelcast">
<map-store enabled="true">
<class-name>net.vz.storage.CigPersister</class-name>
<write-delay-seconds>0</write-delay-seconds>
</map-store>
<backup-count>3</backup-count>
</map>
Mittwoch, 10. März 2010
70. NoSQL: Hazelcast
Concurrent ID Generator (CIG)
Future use-cases:
• Advanced preaggregated cache
• Distributed Executions
Mittwoch, 10. März 2010
71. Lessons Learned
• Start benchmarking and profiling your app early!
• A fast and easy Deployment keeps the motivation
up
• Configure Voldemort carefully (especially on large
heap machines)
• Read the mailing lists of the #nosql system you use
• No Solution in docs? - read the sources!
• At some point stop discussing and just do it!
Mittwoch, 10. März 2010
72. In Progress
• Network Stream
• Global Public Stream
• Stream per Location
• Hashtags
Mittwoch, 10. März 2010
73. Future
• Geo Location Stream
• Third Party API
Mittwoch, 10. März 2010