HDFS
How HBase leverages HDFS for distributed storage, including NameNode and DataNode architecture and file replication.
As HBase runs on HDFS (and each StoreFile is written as a file on HDFS), it is important to have an understanding of the HDFS Architecture especially in terms of how it stores files, handles failovers, and replicates blocks.
See the Hadoop documentation on HDFS Architecture for more information.
NameNode
The NameNode is responsible for maintaining the filesystem metadata. See the above HDFS Architecture link for more information.
DataNode
The DataNodes are responsible for storing HDFS blocks. See the above HDFS Architecture link for more information.
Bulk Loading
Efficient methods for loading large datasets into HBase using MapReduce to generate HFiles and directly load them into the cluster.
Timeline-consistent High Available Reads
Using region replicas to achieve high availability for reads with timeline consistency, reducing read unavailability during failures.