HBase favicon

Apache HBase

Architecture

Comprehensive guide to HBase architecture including client-server model, regions, WAL, compaction, and advanced features.

Resources

  1. More information about the design and implementation can be found at the jira issue: HBASE-10070

  2. HBaseCon 2014 talk: HBase Read High Availability Using Timeline-Consistent Region Replicas also contains some details and slides.

In this section:

Overview

Introduction to HBase as a NoSQL distributed database, key features, scalability characteristics, and when to use HBase.

Catalog Tables

Understanding hbase:meta catalog table structure, location tracking, and how HBase maintains region metadata.

Client

HBase client architecture, connection management, metadata caching, and client-side configuration for optimal performance.

Client Request Filters

Using filters with Get and Scan operations to efficiently query HBase data, including comparison, column, row, and utility filters.

Master

HBase Master server responsibilities including RegionServer monitoring, metadata operations, load balancing, and failover behavior.

RegionServer

HBase RegionServer implementation, interfaces, read/write paths, block cache, memstore management, and performance tuning.

Regions

Understanding HBase regions, stores, memstore, write-ahead log (WAL), compaction, splits, and region management strategies.

Bulk Loading

Efficient methods for loading large datasets into HBase using MapReduce to generate HFiles and directly load them into the cluster.

HDFS

How HBase leverages HDFS for distributed storage, including NameNode and DataNode architecture and file replication.

Timeline-consistent High Available Reads

Using region replicas to achieve high availability for reads with timeline consistency, reducing read unavailability during failures.

Storing Medium-sized Objects (MOB)

Optimized storage and handling of medium-sized objects (100KB-10MB) in HBase using the MOB feature for improved performance.

Scan Over Snapshot

Using TableSnapshotScanner to scan HBase snapshots directly from HDFS, bypassing RegionServers for better performance.

On this page