hbaseconasia2019 HBase at Tencent

•

0 likes•259 views

Andrew Cheng Track 3: Applications https://open.mi.com/conference/hbasecon-asia-2019 THE COMMUNITY EVENT FOR APACHE HBASE™ July 20th, 2019 - Sheraton Hotel, Beijing, China https://hbase.apache.org/hbaseconasia-2019/

What's hot

Amazon RedShift - Ianni Vamvadelis

huguk

Amazon Elastic Map Reduce - Ian Meyers

huguk

HBaseCon 2015: HBase Operations in a Flurry

HBaseCon

HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud

Michael Stack

New Journey of HBase in Alibaba and Cloud discusses Alibaba's use of HBase over 8 years and improvements made. Key points discussed include: - Alibaba began using HBase in 2010 and has since contributed to the open source community while developing internal improvements. - Challenges addressed include JVM garbage collection pauses, separating computing and storage, and adding cold/hot data tiering. A diagnostic system was also created. - Alibaba uses HBase across many core scenarios and has integrated it with other databases in a multi-model approach to support different workloads. - Benefits of running HBase on cloud include flexibility, cost savings, and making it

Володимир Цап "Constraint driven infrastructure - scale or tune?"

Fwdays

Volodymyr Tsap discusses how to save money on infrastructure through constraint driven design. He provides examples of hardware configurations on AWS, bare metal servers, and PaaS platforms to demonstrate how costs can be optimized. Tsap also outlines ways to reduce software costs through choices in operating system, virtualization, databases, and orchestration. Infrastructure support costs depend on the complexity of the environment, with basic setups costing $500-800 per month while more advanced architectures are $4,000-6,000 per month. The overall message is that money saved through optimization can be invested in people.

Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...

Fwdays

We will start from understanding how Real-Time Analytics can be implemented on Enterprise Level Infrastructure and will go to details and discover how different cases of business intelligence be used in real-time on streaming data. We will cover different Stream Data Processing Architectures and discus their benefits and disadvantages. I'll show with live demos how to build Fast Data Platform in Azure Cloud using open source projects: Apache Kafka, Apache Cassandra, Mesos. Also I'll show examples and code from real projects.

Rolling Out Apache HBase for Mobile Offerings at Visa

HBaseCon

Partha Saha and CW Chung (Visa) Visa has embarked on an ambitious multi-year redesign of its entire data platform that powers its business. As part of this plan, the Apache Hadoop ecosystem, including HBase, will now become a staple in many of its solutions. Here, we will describe our journey in rolling out a high-availability NoSQL solution based on HBase behind some of our prominent mobile offerings.

Argus Production Monitoring at Salesforce

HBaseCon

HBaseCon 2015: State of HBase Docs and How to Contribute

HBaseCon

HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...

Michael Stack

This document discusses the use of HBase in a vehicle monitoring system. It describes challenges including handling huge amounts of vehicle data from 100k vehicles generating 2TB of data daily. It outlines decisions around using Java, Kafka, HBase, and microservices. The system architecture is shown storing vehicle data in HBase with data backup. Challenges with HBase like query speed are discussed. Prospects include rewriting components in Go, splitting to microservices, and data analysis.

HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data

Michael Stack

This document discusses Apache Kylin, an OLAP engine for big data. It provides 3 key points: 1. Apache Kylin is designed to provide fast, interactive queries on large datasets stored in Hadoop. It uses pre-calculated cube structures stored in HBase to enable sub-second query performance on trillion row datasets. 2. HBase was selected as the storage engine because it is natively integrated with Hadoop, supports high throughput and low-latency queries, and can store very large datasets. Kylin leverages HBase's capabilities for cube storage, metadata storage, online calculation pushing, and caching. 3. Apache Kylin is used by over 1000 global companies across

Amazon Redshift Masterclass

Amazon Web Services

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. You can start small for just $0.25 per hour with no commitment or upfront costs and scale to a petabyte or more for $1,000 per terabyte per year, less than a tenth of most other data warehousing solutions. See a recording of the webinar based on this presentation here on YouTube: https://youtu.be/GgLKodmL5xE Masterclass series webinars, including on-demand access to all of this years recorded webinars: http://aws.amazon.com/campaigns/emea/masterclass/ Journey Through the Cloud webinar series, including on-demand access to all webinars so far this year: http://aws.amazon.com/campaigns/emea/journey/

HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight

HBaseCon

hbaseconasia2017: HareQL：快速HBase查詢工具的發展過程

HBaseCon

Mon-Fong Mike Jiang, Kuan-Yu Hubert Fan-Chiang and Tienyu Rebecca Lin 自2011年起，我們就開始使用HBase作為結構化大數據的儲存工具，主要是做為半導體製造設備參數的分析。為了有效進行數據查詢，我們開發Standard Query Language(SQL)的整合介面，最早的方式是(1)自行開發GUI操作介面及(2)透過自行定義SQL語法的方式進行，但是這樣會衍生出很多額外的工作，特別是SQL Parser與對應的HBase API的連結。為了解決此問題，我們解析了Hive QL Parser作為主要的核心，將此部分的原始碼整合進HareDB HBase Client之中，另外，也整合了HBase Coprocessor，可以加速查詢的進行，這個架構我們實際使用在數個半導體製造廠的大數據系統中，也展現了高查詢效率。除此之外，透過整合Kafka來處理串流數據的匯入，同時對於數據分析的呈現也加上Cube建立工具，這些都是實際開發大數據系統時陸續面對的問題與解決方法，我們將分享這一連串的系統開發過程。 hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#

Building your data warehouse with Redshift

Amazon Web Services

Amazon Redshift is a data warehouse service that runs on AWS. It has a leader node that coordinates queries and compute nodes that store and process the data in parallel. The compute nodes can use either HDD storage optimized for large datasets or SSD storage optimized for fast queries. Data is stored in columns and compressed to reduce I/O. Queries are optimized using statistics on the data distribution, sort keys and other metadata. The EXPLAIN command and STL tables provide visibility into query plans and performance.

Introduction to streaming and messaging flume,kafka,SQS,kinesis

Omid Vahdaty

Masterclass - Redshift

Amazon Web Services

This document summarizes a presentation on Amazon Redshift. Redshift is a fully managed data warehouse service that makes it easy to analyze large amounts of data for less than $1,000 per terabyte per year. The presentation covers how to get started with Redshift, best practices for table design and data loading, using Redshift for analytics, and upgrading and scaling a Redshift data warehouse over time.

Best Practices for Migrating Your Data Warehouse to Amazon Redshift

Amazon Web Services

by Darin Briskman, Technical Evangelist, AWS You can gain substantially more business insights and save costs by migrating your existing data warehouse to Amazon Redshift. This session will cover the key benefits of migrating to Amazon Redshift, migration strategies, and tools and resources that can help you in the process. We’ll learn about AWS Database Migration Service and AWS Schema Migration Tool, which were recently enhanced to import data from six common data warehouse platforms. Level: 200

HBaseCon2017 Apache HBase at Didi

HBaseCon

In DiDi Chuxing Company, which is China’s most popular ride-sharing company. we use HBase to serve when we have a bigdata problem. We run three clusters which serve different business needs. We backported the Region Grouping feature back to our internal HBase version so we could isolate the different use cases. We built the Didi HBase Service platform which is popular amongst engineers at our company. It includes a workflow and project management function as well as a user monitoring view. Internally we recommend users use Phoenix to simplify access.even more,we used row timestamp;multidimensional table schema to slove muti dimension query problems C++, Go, Python, and PHP clients get to HBase via thrift2 proxies and QueryServer. We run many important buisness applications out of our HBase cluster such as ETA/GPS/History Order/API metrics monitoring/ and Traffic in the Cloud. If you are interested in any aspects listed above, please come to our talk. We would like to share our experiences with you.

Leveraging Amazon Redshift for your Data Warehouse

Amazon Web Services

In this presentation, you will get a look under the covers of Amazon Redshift, a fast, fully-managed, petabyte-scale data warehouse service for less than $1,000 per TB per year. Learn how Amazon Redshift uses columnar technology, optimized hardware, and massively parallel processing to deliver fast query performance on data sets ranging in size from hundreds of gigabytes to a petabyte or more. We'll also walk through techniques for optimizing performance and, you’ll hear from a specific customer and their use case to take advantage of fast performance on enormous datasets leveraging economies of scale on the AWS platform.

What's hot (20)

Amazon RedShift - Ianni Vamvadelis

Amazon Elastic Map Reduce - Ian Meyers

HBaseCon 2015: HBase Operations in a Flurry

HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud

Володимир Цап "Constraint driven infrastructure - scale or tune?"

Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...

Rolling Out Apache HBase for Mobile Offerings at Visa

Argus Production Monitoring at Salesforce

HBaseCon 2015: State of HBase Docs and How to Contribute

HBaseConAsia2018 Track3-7: The application of HBase in New Energy Vehicle Mon...

HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data

Amazon Redshift Masterclass

HBaseCon 2015: Optimizing HBase for the Cloud in Microsoft Azure HDInsight

hbaseconasia2017: HareQL：快速HBase查詢工具的發展過程

Building your data warehouse with Redshift

Introduction to streaming and messaging flume,kafka,SQS,kinesis

Masterclass - Redshift

Best Practices for Migrating Your Data Warehouse to Amazon Redshift

HBaseCon2017 Apache HBase at Didi

Leveraging Amazon Redshift for your Data Warehouse

Similar to hbaseconasia2019 HBase at Tencent

Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...

WebExpo

This document discusses using Hadoop and HBase to build content relevance and personalization systems for big data applications. It provides an overview of Hadoop and HBase, and how they can be used together. As a case study, it describes how Groupon uses Hadoop and HBase for their deal relevance and personalization systems, including storing user data in HBase and running recommendation algorithms using MapReduce.

Getting Started with Real-Time Analytics

Amazon Web Services

This session is recommended for anyone interested in building real-time streaming applications using AWS. In this session, you will get a deep understanding of how data can be ingested by Amazon Kinesis and made available for real-time analysis and processing. We’ll also show how you can leverage the Kinesis client to make your applications highly available and fault tolerant. We’ll explore various design considerations in implementing real-time solutions and explain key concepts against the backdrop of an actual use case. Finally, we’ll situate stream processing in the broader context of your big data applications.

Big dataarchitecturesandecosystem+nosql

Khanderao Kand

This document provides an overview of big data architecture, the Hadoop ecosystem, and NoSQL databases. It discusses common big data use cases, characteristics, and tools. It describes the typical 3-tier traditional architecture compared to the big data architecture using Hadoop. Key components of Hadoop like HDFS, MapReduce, Hive, Pig, Avro/Thrift, HBase are explained. The document also discusses stream processing tools like Storm, Spark and real-time query with Impala. It notes how NoSQL databases can integrate with Hadoop/MapReduce for both batch and real-time processing.

Stratebi Big Data

Stratebi

This document discusses big data technologies for enterprise analytics. It begins by defining big data and classifying big data technologies into three groups: Apache Hadoop, NoSQL databases, and extended RDBMS. It then provides examples of using different technologies for enterprise data warehouse extensions, website clickstream analysis, and real-time analytics. The document also discusses Hadoop distributions and Pentaho's support for big data and provides some big data success stories.

Hadoop and Pig at Twitter__HadoopSummit2010

Yahoo Developer Network

1. Hadoop is used extensively at Twitter to handle large volumes of data from logs and other sources totaling 7TB per day. Tools like Scribe and Crane are used to input data and Elephant Bird and HBase for storage. 2. Pig is used for data analysis on these large datasets to perform tasks like counting, correlating, and researching trends in users and tweets. 3. The results of these analyses are used to power various internal and external Twitter products and keep the business agile through ad-hoc analyses.

Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...

Cloudera, Inc.

Michael Sun presented on CBS Interactive's use of Hadoop for web analytics processing. Some key points: - CBS Interactive processes over 1 billion web logs daily from hundreds of websites on a Hadoop cluster with over 1PB of storage. - They developed an ETL framework called Lumberjack in Python for extracting, transforming, and loading data from web logs into Hadoop and databases. - Lumberjack uses streaming, filters, and schemas to parse, clean, lookup dimensions, and sessionize web logs before loading into a data warehouse for reporting and analytics. - Migrating to Hadoop provided significant benefits including reduced processing time, fault tolerance, scalability, and cost effectiveness compared to their

Gaming SEC Filings Using Machine Learning to Detect Vectors and Sentiment in ...

Safe Software

Using FME we build an API to collect and clean the US federal Security & Exchange Commission quarterly filings from the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) website. Using FME to quickly pool the filing data we perform sentiment analysis on the cleaned unstructured Management Discussion & Analysis (MD&A) data. We implement word to vector strategies to tokenize the fairly boilerplate text and assign the companies into groupings of changer and non-changer companies. This is done mainly graphing deltas in cosine similarity in the tokenized word vectors and also using word count vector strategies to flag language unattractive to investment. The end goal for this analysis is to forecast abnormal returns and find diversification opportunities which align with our existing clients.

AWS Webcast - Tableau Big Data Solution Showcase

Amazon Web Services

2 years ago if someone had claimed they could stand up a petabyte scale data warehouse in under an hour and then have a non-technical business user querying it live 30 minutes later without knowing any SQL or coding language, they would have been laughed out of the room. These days, that’s called taking advantage of disruptive technology. Amazon Web Services and Tableau Software have shifted the entire paradigm by which organizations not only store and access their data, but ultimately how they innovate with it. The fast, scalable, and inexpensive services that AWS provides for housing data combined with Tableau’s unbelievably flexible and user friendly visual analytic solution means that within hours an organization can securely put the power of their massive data assets into the hands of their domain experts without expensive overhead or lengthy ramp-up time. Attend this webinar to learn how Amazon Web Services and Tableau Software are leveraged together everyday to: • Empower visual ad-hoc data discovery against big data • Revolutionize corporate reporting and dashboards • Promote data driven decision making at every level The presentation will include: • A live demonstration of AWS and Tableau working together • A real customer case study focused on fraud detection and online video metrics • Live Q&A and an opportunity to trial both solutions

Building High Performance Apps with In-memory Data

Amazon Web Services

by Mikhail Prudnikov, Sr. Solutions Architect, AWS In-memory data stores, such as ElastiCache for Redis, enable applications where response times are measured in microseconds. We’ll look at how to design and deploy high-performance applications using ElastiCache, Aurora, DynamoDB, DAX, and Lambda, then we’ll do a hands-on lab to do it ourselves. You’ll need a laptop with a Firefox or Chrome browser.

Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...

Impetus Technologies

Traditional databases and batch ETL operations have not been able to serve the growing data volumes and the need for fast and continuous data processing. How can modern enterprises provide their business users real-time access to the most up-to-date and complete data? In our upcoming webinar, our experts will talk about how real-time CDC improves data availability and fast data processing through incremental updates in the big data lake, without modifying or slowing down source systems. Join this session to learn: What is CDC and how it impacts business The various methods for CDC in the enterprise data warehouse The key factors to consider while building a next-gen CDC architecture: Batch vs. real-time approaches Moving from just capturing and storing, to capturing enriching, transforming, and storing Avoiding stopgap silos to state-through processing Implementation of CDC through a live demo and use-case You can view the webinar here - https://www.streamanalytix.com/webinar/planning-your-next-gen-change-data-capture-cdc-architecture-in-2019/ For more information visit - https://www.streamanalytix.com

Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution

Etu Solution

講者：Informatica 資深產品顧問 | 尹寒柏議題簡介：Big Data 時代，比的不是數據數量，而是了解數據的深度。現在，因為 Big Data 技術的成熟，讓非資訊背景的 CXO 們，可以讓過去像是專有名詞的 CI (Customer Intelligence) 變成動詞，從 BI 進入 CI，更連結消費者經濟的脈動，洞悉顧客的意圖。不過，有個 Big Data 時代要注意的思維，那就是競爭到最後，不單只是看數據量的增長，還要比誰能更了解數據的深度。而 Informatica 正是這個最佳解決的答案。我們透過 Informatica 解決在企業及時提供可信賴數據的巨大壓力;同時隨著日益增高的數據量和複雜程度，Informatica 也有能力提供更快速彙集數據技術，從而讓數據變的有意義並可供企業用來促進效率提升、完善品質、保證確定性和發揮優勢的功能。Inforamtica 提供了更為快速有效地實現此目標的方案，是精誠集團在 Big Data 時代的最佳工具。

Hive @ Hadoop day seattle_2010

nzhang

Hive is used at Facebook for data warehousing and analytics tasks on a large Hadoop cluster. It allows SQL-like queries on structured data stored in HDFS files. Key features include schema definitions, data summarization and filtering, extensibility through custom scripts and functions. Hive provides scalability for Facebook's rapidly growing data needs through its ability to distribute queries across thousands of nodes.

Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath

Yahoo Developer Network

Offline and stream processing of big data sets can be done with tools such as Hadoop, Spark, and Storm, but what if you need to process big data at the time a user is making a request? Vespa (http://www.vespa.ai) allows you to search, organize and evaluate machine-learned models from e.g TensorFlow over large, evolving data sets with latencies in the tens of milliseconds. Vespa is behind the recommendation, ad targeting, and search at Yahoo where it handles billions of daily queries over billions of documents.

Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...

Amazon Web Services

Batch querying and reporting is no longer enough for many organizations. Reducing time to insight – the time it takes to turn data into actionable insights – is becoming increasingly important to remain competitive. That’s why organizations are quickly evolving their data applications to support a broader set of real-time analytic use cases. In this webinar, we will review some of the common use cases for real-time analytics such as click-stream analysis, event data processing, and real-time analytics. We will show proven architectures for collecting, storing, and processing real-time data using a combination of AWS managed services, including Amazon Kinesis Streams, Amazon Kinesis Firehose, Amazon EMR, and AWS Lambda, as well open source tools, such as Apache Spark. Then, we will discuss common approaches and best practices to incorporate real-time analytics into your existing batch applications. Learning Objectives: • Understand how to incorporate real-time analytics into existing applications • Best practices to combine batch with real-time data flows • Learn common architectures and use cases for real-time analytics

Admin Tech Clash: Discussing Best (and Worst) Administration Practices from ...

Christoph Adler

Admin Tech Clash: Discussing Best (and Worst) Administration Practices from ...

Christoph Adler

This document summarizes a presentation on analyzing administrative data from Domino servers and clients. It discusses sources of usage data like activity trends, server-side logs in log.nsf, and client-side logging. Examples are given of visualizing this data to understand usage patterns, database demand, and network traffic. Third-party tools are also mentioned for enhanced analysis across multiple databases and servers.

FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud

Amazon Web Services

FINRA’s Data Lake unlocks the value in its data to accelerate analytics and machine learning at scale. FINRA's Technology group has changed its customer's relationship with data by creating a Managed Data Lake that enables discovery on Petabytes of capital markets data, while saving time and money over traditional analytics solutions. FINRA’s Managed Data Lake includes a centralized data catalog and separates storage from compute, allowing users to query from petabytes of data in seconds. Learn how FINRA uses Spot instances and services such as Amazon S3, Amazon EMR, Amazon Redshift, and AWS Lambda to provide the 'right tool for the right job' at each step in the data processing pipeline. All of this is done while meeting FINRA’s security and compliance responsibilities as a financial regulator.

Enterprise Data Lakes

Farid Gurbanov

History of Apache Pinot

Kishore Gopalakrishna

Web performance optimization

Kaliop-slide

SUMMARY : We all have the contradictory feeling to deliver not-so-bad projects, with no-so-bad performances. But what really is an perfectly optimized project ? For you : optimized PHP code & SQL queries For your boss : the customer who never complains For the customer : own experience on his workstation For the business : who really know and care ? For end-user : who can really know the end-user experience (could be millions of users) ? Without losing interest on technical aspects (PHP, MySql, Solr, Varnish, CDN, etc.) & softwares (new relic, jmeter, etc.), this presentation will send a feedback from real projects to : How to integrate performances within the project scope ? What & how to measure & collect smart metrics ? Enlarge the scope : from your dev workstation to the end-user… in china ! Experience level: Intermediate Session Track: Performance

Similar to hbaseconasia2019 HBase at Tencent (20)

Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...

Getting Started with Real-Time Analytics

Big dataarchitecturesandecosystem+nosql

Stratebi Big Data

Hadoop and Pig at Twitter__HadoopSummit2010

Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...

Gaming SEC Filings Using Machine Learning to Detect Vectors and Sentiment in ...

AWS Webcast - Tableau Big Data Solution Showcase

Building High Performance Apps with In-memory Data

Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...

Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution

Hive @ Hadoop day seattle_2010

Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath

Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...

Admin Tech Clash: Discussing Best (and Worst) Administration Practices from ...

FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud

Enterprise Data Lakes

History of Apache Pinot

Web performance optimization

More from Michael Stack

hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud

Michael Stack

hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd

Michael Stack

hbaseconasia2019 HBase at Didi

Michael Stack

hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...

Michael Stack

hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...

Michael Stack

hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...

Michael Stack

hbaseconasia2019 Pharos as a Pluggable Secondary Index Component

Michael Stack

hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba

Michael Stack

hbaseconasia2019 OpenTSDB at Xiaomi

Michael Stack

hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark

Michael Stack

hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase

Michael Stack

hbaseconasia2019 Distributed Bitmap Index Solution

Michael Stack

hbaseconasia2019 HBase Bucket Cache on Persistent Memory

Michael Stack

hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL

Michael Stack

hbaseconasia2019 BDS: A data synchronization platform for HBase

Michael Stack

hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...

Michael Stack

hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...

Michael Stack

The document discusses HBCK2, a tool for fixing issues in HBase 2. Some key points: 1. HBCK2 is simpler than HBCK1, with fewer fix commands and no diagnosis commands. It requires a deeper understanding of HBase internals. 2. HBCK2 commands are master-oriented and fix issues one at a time. Common issues include regions not online, stuck procedures, and tables in the wrong state. 3. Recipes are provided to fix specific issues like missing meta regions or regions in transition using HBCK2 commands like assigns and bypass. 4. HBCK2 is still a work in progress but contributions are welcome

HBaseConAsia2019 Keynote

Michael Stack

HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies

Michael Stack

This document discusses how Bloomberg uses HBase to serve billions of queries with millisecond latency. It covers HBase principles like being an ordered key-value store and providing ACID transactions. It also discusses modeling data for HBase, including dealing with data and query skew. Implementation details covered include caching, block size tuning, column families, and compaction. The overall goal is to optimize HBase for Bloomberg's low-latency data storage and retrieval needs.

HBaseConAsia2018 Track1-3: HBase at Xiaomi

Michael Stack

This document summarizes Xiaomi's implementation and use of HBase for data storage. It discusses Xiaomi's HBase clusters across multiple public cloud providers and data centers. It also describes Xiaomi's approaches to multi-tenancy, quota and throttling, synchronous replication between clusters, and high availability in the case of node or cluster failures. Synchronous replication provides stronger consistency guarantees but with some performance overhead compared to asynchronous replication.

More from Michael Stack (20)

hbaseconasia2019 HBase Table Monitoring and Troubleshooting System on Cloud

hbaseconasia2019 Phoenix Practice in China Life Insurance Co., Ltd

hbaseconasia2019 HBase at Didi

hbaseconasia2019 The Practice in trillion-level Video Storage and billion-lev...

hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...

hbaseconasia2019 Bridging the Gap between Big Data System Software Stack and ...

hbaseconasia2019 Pharos as a Pluggable Secondary Index Component

hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba

hbaseconasia2019 OpenTSDB at Xiaomi

hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark

hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase

hbaseconasia2019 Distributed Bitmap Index Solution

hbaseconasia2019 HBase Bucket Cache on Persistent Memory

hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL

hbaseconasia2019 BDS: A data synchronization platform for HBase

hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...

hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...

HBaseConAsia2019 Keynote

HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies

HBaseConAsia2018 Track1-3: HBase at Xiaomi

Recently uploaded

ADEGUNADEGUNADEGUNADEGUNADEGUNADEGUNADEGUN.pdf

ifraghaffar125

How God led me to DTS? Through many different signs and connections that I c...

AshishMohan57

The Money Wave 2024 Review_ Is It the Key to Financial Success.pdf

nirahealhty

What is The Money Wave? The Money Wave is a comprehensive financial program designed to equip individuals with the knowledge and tools necessary for achieving financial independence. It encompasses a range of resources, including educational materials, webinars, and community support, all aimed at helping users understand and leverage various financial opportunities. ➡️ Click here to get The Money Wave from the official website. Key Features of The Money Wave Educational Resources: The Money Wave offers a wealth of educational materials that cover essential financial topics, including budgeting, investing, and wealth-building strategies. These resources are designed to empower users with the knowledge needed to make informed financial decisions. Expert Guidance: Users gain access to insights from financial experts who share their experiences and strategies for success. This guidance can be invaluable for individuals looking to navigate the complexities of personal finance. Community Support: The program fosters a supportive community where users can connect with like-minded individuals. This network provides encouragement, accountability, and shared experiences that can enhance the learning process. Actionable Strategies: The Money Wave emphasizes practical, actionable strategies that users can implement immediately. This focus on real-world application sets it apart from other financial programs that may be more theoretical in nature. Flexible Learning: The program is designed to accommodate various learning styles and schedules. Users can access materials at their convenience, making it easier to integrate financial education into their daily lives. Benefits of The Money Wave Increased Financial Literacy: One of the primary benefits of The Money Wave is the enhancement of financial literacy. Users learn essential concepts that enable them to make better financial decisions, ultimately leading to improved financial health. Empowerment: By providing users with the tools and knowledge needed to take control of their finances, The Money Wave empowers individuals to take proactive steps toward achieving their financial goals. Networking Opportunities: The community aspect of The Money Wave allows users to connect with others who share similar financial aspirations. This network can lead to valuable partnerships, collaborations, and support systems. Long-Term Success: The strategies taught in The Money Wave are designed for long-term success. Users are encouraged to adopt a mindset of continuous learning and growth for sustained financial well-being. Accessibility: With its online format, The Money Wave is accessible to anyone with an internet connection. This inclusivity allows individuals from various backgrounds to benefit from the program.

How Can Microsoft Office 365 Improve Your Productivity?

Digital Host

Microsoft Office 365 is a cloud-based subscription service offering essential productivity tools. It includes Word for documents, Excel for data analysis, PowerPoint for presentations, Outlook for email, OneDrive for cloud storage, and Teams for collaboration. Key benefits are accessibility from any device, advanced security, and regular updates. Office 365 enhances collaboration with real-time co-authoring and Teams, streamlines communication with Outlook and Teams Chat, and improves data management with OneDrive and SharePoint. For reliable office 365 hosting, Digital Host offers various subscription plans, setup support, and training resources. Visit https://www.digitalhost.com/email-office/office-365/

Module 16 Incineration of Healthcare Waste and the Stockholm Convention Guide...

Beshoelwy

The Ultimate Guide to Web Hosting Reviews in 2024.pdf

Hosting Mastery Hub

Choosing the right web hosting provider can be a daunting task, especially with the plethora of options available. To help you make an informed decision, we’ve compiled comprehensive reviews of some of the top web hosting providers for 2024, with a special focus on Hosting Mastery Hub. This guide will cover the features, pros, cons, and unique offerings of each provider. By the end, you’ll have a clearer understanding of which hosting service best suits your needs.

Do it again anti Republican shirt Do it again anti Republican shirt

exgf28

High-Yield Dow Jones Stocks Worth Investing in Today.docx

SFC Today

Latest Deals in the Metaverse & NFT Markets.docx

SFC Today

New York Institute of Technology degree Cert diploma offer

ubovu

定制留信网认证Cert【微信：176555708】【NYIT毕业证（纽约理工大学毕业证）成绩单offer】【微信：176555708】（留信学历认证永久存档查询）采用学校原版纸张（包括：隐形水印，阴影底纹，钢印LOGO烫金烫银，LOGO烫金烫银复合重叠，文字图案浮雕，激光镭射，紫外荧光，温感，复印防伪）行业标杆！精益求精，诚心合作，真诚制作！多年品质 ,按需精细制作，24小时接单,全套进口原装设备，十五年致力于帮助留学生解决难题，业务范围有加拿大、英国、澳洲、韩国、美国、新加坡，新西兰等学历材料，包您满意。【业务选择办理准则】一、工作未确定，回国需先给父母、亲戚朋友看下文凭的情况，办理一份就读学校的毕业证【微信：176555708】文凭即可二、回国进私企、外企、自己做生意的情况，这些单位是不查询毕业证真伪的，而且国内没有渠道去查询国外文凭的真假，也不需要提供真实教育部认证。鉴于此，办理一份毕业证【微信：176555708】即可三、进国企，银行，事业单位，考公务员等等，这些单位是必需要提供真实教育部认证的，办理教育部认证所需资料众多且烦琐，所有材料您都必须提供原件，我们凭借丰富的经验，快捷的绿色通道帮您快速整合材料，让您少走弯路。留信网认证的作用: 1:该专业认证可证明留学生真实身份【微信：176555708】 2:同时对留学生所学专业登记给予评定 3:国家专业人才认证中心颁发入库证书 4:这个认证书并且可以归档倒地方 5:凡事获得留信网入网的信息将会逐步更新到个人身份内，将在公安局网内查询个人身份证信息后，同步读取人才网入库信息 6:个人职称评审加20分 7:个人信誉贷款加10分 8:在国家人才网主办的国家网络招聘大会中纳入资料，供国家高端企业选择人才 → 【关于价格问题（保证一手价格）我们所定的价格是非常合理的，而且我们现在做得单子大多数都是代理和回头客户介绍的所以一般现在有新的单子我给客户的都是第一手的代理价格，因为我想坦诚对待大家不想跟大家在价格方面浪费时间对于老客户或者被老客户介绍过来的朋友，我们都会适当给一些优惠。选择实体注册公司办理，更放心，更安全！我们的承诺：可来公司面谈，可签订合同，会陪同客户一起到教育部认证窗口递交认证材料，客户在教育部官方认证查询网站查询到认证通过结果后付款，不成功不收费！办理纽约理工大学毕业证（NYIT毕业证【微信：176555708】外观非常精致，由特殊纸质材料制成，上面印有校徽、校名、毕业生姓名、专业等信息。办理纽约理工大学毕业证（NYIT毕业证【微信：176555708】格式相对统一，各专业都有相应的模板。通常包括以下部分：校徽：象征着学校的荣誉和传承。校名:学校英文全称授予学位：本部分将注明获得的具体学位名称。毕业生姓名：这是最重要的信息之一，标志着该证书是由特定人员获得的。颁发日期：这是毕业正式生效的时间，也代表着毕业生学业的结束。其他信息：根据不同的专业和学位，可能会有一些特定的信息或章节。办理纽约理工大学毕业证（NYIT毕业证【微信：176555708】价值很高，需要妥善保管。一般来说，应放置在安全、干燥、防潮的地方，避免长时间暴露在阳光下。如需使用，最好使用复印件而不是原件，以免丢失。综上所述，办理纽约理工大学毕业证（NYIT毕业证【微信：176555708 】是证明身份和学历的高价值文件。外观简单庄重，格式统一，包括重要的个人信息和发布日期。对持有人来说，妥善保管是非常重要的。

How to Become a Digital Marketer in 2024.docx

InfyQ Seo Experts

In today's digital world, digital marketers are indispensable. They play a crucial role in helping businesses connect with their audiences effectively through various online channels. Whether you're considering a career change or aiming to advance in the field, here’s a detailed guide to thriving as a digital marketer in 2024. Why Choose Digital Marketing? Digital marketing encompasses a wide array of strategies aimed at engaging and converting online audiences. From optimizing websites for search engines to crafting compelling social media campaigns and leveraging data analytics, digital marketers drive business growth and enhance brand visibility in the digital sphere. Essential Skills for Success To excel in digital marketing, mastering a diverse skill set is essential: 1. SEO (Search Engine Optimization) Understanding Search Engine Optimization principles is vital for enhancing a website's visibility in search engine results. This includes keyword research, on-page optimization techniques, and building authoritative backlinks to boost organic traffic. 2. PPC (Pay-Per-Click) Advertising PPC advertising involves placing targeted ads on search engines and social media platforms, paying only when users click. Proficiency in platforms like Google Ads and Facebook Ads, along with strategic bidding and ad copywriting skills, is crucial for maximizing campaign ROI. 3. Social Media Marketing Social media platforms serve as powerful tools for engaging with audiences and building brand loyalty. Effective social media marketers understand platform nuances, create engaging content, and utilize analytics to refine strategies and drive meaningful engagement. 4. Content Marketing Content marketing revolves around creating valuable, relevant content that attracts and retains target audiences. This includes blog posts, videos, infographics, and eBooks tailored to resonate with audience interests and needs. 5. Email Marketing Email marketing remains an effective channel for nurturing leads and maintaining customer relationships. Skills in crafting personalized campaigns, segmenting audiences, and analyzing email performance metrics are essential for optimizing campaign effectiveness. 6. Analytics and Data Interpretation Data-driven decision-making is pivotal in digital marketing success. Proficiency in tools like Google Analytics enables marketers to track website traffic, user behavior, and campaign performance, providing actionable insights to drive continuous improvement.

Java Training in Chandigarh.Mastering Java: From Fundamentals to Advanced App...

aryan4bhardwaj37

Excel in Java Programming with Excellence Academy‘s top-notch Best Java training & Certification in Chandigarh. Immerse yourself in 100% practical training on live projects from global clients in the USA, UK, France, and Germany. Our comprehensive program covers the development of dynamic web applications, emphasizing Java, Servlets, JSP, Spring, and more. Whether pursuing a full-time one-year diploma or a short-term course, Excellence Academy offers a 2-year validity for your Java programming journey. Our Java training is the gateway to mastering programming languages and building robust, scalable applications. So enroll now the Java Complete Course For Beginners.

The Money Wave 2024 Review: Is It the Key to Financial Success?

nirahealhty

What is The Money Wave? The Money Wave is a wealth manifestation software designed to help individuals attract financial abundance through audio tracks. Created by James Rivers, this program uses scientifically-backed methods to improve cognitive functions and reduce stress, thereby enhancing one's ability to manifest wealth. How Does The Money Wave Audio Program Work? The Cash Wave program works by utilizing the force of sound frequencies to overhaul your cerebrum. These audio tracks are designed to promote deep relaxation and improve cognitive functions. The underlying science suggests that specific sound waves can influence brain activity, leading to enhanced problem-solving abilities and reduced stress levels. How to Use The Money Wave Program? Using The Money Wave program is straightforward: Download the Audio Tracks: Once purchased, you can download the audio files from the official website. Listen Daily: For best results, listen to the tracks daily. Consistency is key. Relax and Visualize: Find a quiet place, relax, and visualize your financial goals as you listen. Follow the Guide: The program includes a detailed guide to help you maximize the benefits.

Recently uploaded (13)

ADEGUNADEGUNADEGUNADEGUNADEGUNADEGUNADEGUN.pdf

How God led me to DTS? Through many different signs and connections that I c...

The Money Wave 2024 Review_ Is It the Key to Financial Success.pdf

How Can Microsoft Office 365 Improve Your Productivity?

Module 16 Incineration of Healthcare Waste and the Stockholm Convention Guide...

The Ultimate Guide to Web Hosting Reviews in 2024.pdf

Do it again anti Republican shirt Do it again anti Republican shirt

High-Yield Dow Jones Stocks Worth Investing in Today.docx

Latest Deals in the Metaverse & NFT Markets.docx

New York Institute of Technology degree Cert diploma offer

How to Become a Digital Marketer in 2024.docx

Java Training in Chandigarh.Mastering Java: From Fundamentals to Advanced App...

The Money Wave 2024 Review: Is It the Key to Financial Success?

hbaseconasia2019 HBase at Tencent

2. HBase At Tencent Andrew Cheng | 程广旭 Tencent | HBase Committer

3. Content 01. HBase Service In Tencent 02. Applications 03. Practices & Optimization

4. 01. HBase Service In Tencent

5. HBase Story in Tencent l Began using since 2013 l Used version l 0.94.17 -> 0.98.6 -> 1.2.5 -> 2.2.0 (ing) l Largest cluster more than 500 nodes 90+ Clusters 4000+ Nodes 10PB+ Data 3Tri+ RPD

6. Overview HBase Users come from 6 groups , more than 100+ different applications

7. Architecture Tencent HBase Zookeeper OpenTSDB S2Graph Spark Tookit HBase Api TDBank Lhotse RestServer ThriftServer Kylin Phoenix Tenpay Doss monitoring TNM2 Deploy CenterWepay Game Advertiseme nt …

8. 02. Applications

9. Tencent Ads – Real-Time Logjoin System Mixer Exposure TDBank Tencent HBase Model learning Freshness Budget control Report Association Table Flow Table Click … LogJoin LogJoin LogJoin LogJoin Data Source Transport Logical Storage Consumer

10. Tenpay - Transaction record Data Source MySQL Binlog Paser DBSync Cache Hippo Storage Tencent HBase Thrift Server Application C++ Read Read Write Application JAVA Read Write TDSort

11. 03. Practices & Optimization

12. Practices–Data migration add_peer disable_peer Set REPLICATION_SCOPE => '1' snapshot clone_snapshot Set REPLICATION_SCOPE => '0' Check Dataenable_peer Client switch to new cluster Cluster A Cluster B ExportSnapshot delete_snapshot Business-insensitive data migration

13. Practices–Table l Create table per day l Large amount of data l TTL is short l Benefit l Reduce the amount of data in compaction l Easy to delete expired data

14. Optimization - Bandwidth ② RS2 and RS3 Wal data ① Input Data ③ RS2 and RS3 Flush data ⑤ RS2 and RS3 Large compact ④ RS2 and RS3 Small compact RS1 RS2 RS3 Input Data Wal Flush ① Small compact Large compact ② ③ ④ ⑤ Input Data Input Data

15. Optimization - Bandwidth l Enable compressing of CellBlocks l Wal compressor l Increase the size of memstore l Reduce the number of threads about compaction l Turn off major compaction l create tables by day

16. Optimization - Online filtering of dirty data l A large amount of data which have the same Rowkey l How to find filter rowkeys? l ResponseTooSlow l How to set filter rowkeys? l hbase.hregion.filter.rowkeys l How to refresh filter rowkeys? l update_config Input Data Filter Enable Write Filter Yes Yes No No

17. Optimization - Prefix Bloom Filter（HBASE-20636） l ROWPREFIX_FIXED_LENGTH l ROWPREFIX_DELIMITER uin ts action Bloom Filter Prefix Create Table: File info:

18. Optimization - Prefix Bloom Filter（HBASE-20636） Scan Not Filter StoreFile Same prefix? {StartKey,EndKey} Computer hash value Hit BloomFilter? Prefix length >= prefix_length Yes Yes No Filter StoreFile No No Get prefix key by prefix_length Yes Read Rowkey Get prefix key by prefix_length Computer hash value Set BloomFilter Last line? Input Data Write BloomFilter information to StoreFile metadata Yes No Write

19. Optimization - RestServer RestServer A Cluster A Cluster B Cluster C RestServer CRestServer B RestServer D User Nginx

20. Optimization - RestServer RestServer A Cluster A Cluster B Cluster C RestServer CRestServer B User Nginx Mysql

21. Optimization - RestServer l Only maintain one configuration l use effectively resources l User-friendly access

22. HBase Community l 1 Committer, 2 Contributor l Total commits: 80+ l Feature l HBASE-20636 Introduce two bloom filter type : ROWPREFIX_FIXED_LENGTH and ROWPREFIX_DELIMITED l HBASE-19799 Add web UI to rsgroup l HBASE-20243 [Shell] Add shell command to create a new table by cloning the existent table l HBASE-19483 Add proper privilege check for rsgroup commands l ………

23. Join Us Personal WechatDept. Wechat

24. Thanks！

hbaseconasia2019 HBase at Tencent

Related slideshows

More Related Content

What's hot

What's hot (20)

Similar to hbaseconasia2019 HBase at Tencent

Similar to hbaseconasia2019 HBase at Tencent (20)

More from Michael Stack

More from Michael Stack (20)

Recently uploaded

Recently uploaded (13)

hbaseconasia2019 HBase at Tencent