HBase Configuration

Apache HBase uses the same configuration system as Apache Hadoop. All configuration files are located in the conf/ directory, which needs to be kept in sync for each node on your cluster. They include

  • backup-masters – Not present by default. A plain-text file which lists hosts on which the Master should start a backup Master process, one host per line.
  • hadoop-metrics2-hbase.properties – Used to connect HBase Hadoop’s Metrics2 framework.
  • hbase-env.cmd and hbase-env.sh – Script for Windows and Linux / Unix environments to set up the working environment for HBase, including the location of Java, Java options, and other environment variables. The file contains many commented-out examples to provide guidance.
  • hbase-policy.xml – The default policy configuration file used by RPC servers to make authorization decisions on client requests. Only used if HBase security is enabled.
  • hbase-site.xml – The main HBase configuration file. This file specifies configuration options which override HBase’s default configuration. You can view (but do not edit) the default configuration file at docs/hbase-default.xml. You can also view the entire effective configuration for your cluster (defaults and overrides) in the HBase Configuration tab of the HBase Web UI.
  • properties – Configuration file for HBase logging via log4j.
  • regionservers – A plain-text file containing a list of hosts which should run a RegionServer in your HBase cluster. By default this file contains the single entry localhost. It should contain a list of hostnames or IP addresses, one per line, and should only contain localhost if each node in your cluster will run a RegionServer on its localhost interface.

Some important configuration parameters are

  • If you have a cluster with a lot of regions, it is possible that a Regionserver checks in briefly after the Master starts while all the remaining RegionServers lag behind. This first server to check in will be assigned all regions which is not optimal. To prevent the above scenario from happening, up the hbase.master.wait.on.regionservers.mintostart property from its default value of 1.
  • If the primary Master loses its connection with ZooKeeper, it will fall into a loop where it keeps trying to reconnect. Disable this functionality if you are running more than one Master: i.e. a backup Master. Failing to do so, the dying Master may continue to receive RPCs though another Master has assumed the role of primary.
  • session.timeout – The default timeout is three minutes (specified in milliseconds). This means that if a server crashes, it will be three minutes before the Master notices the crash and starts recovery. You might like to tune the timeout down to a minute or even less so the Master notices failures the sooner. Before changing this value, be sure you have your JVM garbage collection configuration under control otherwise, a long garbage collection that lasts beyond the ZooKeeper session timeout will take out your RegionServer (You might be fine with this — you probably want recovery to start on the server if a RegionServer has been in GC for a long period of time). To change this configuration, edit hbase-site.xml, copy the changed file around the cluster and restart.
  • zookeeper.property.initLimit – Property from ZooKeeper’s config zoo.cfg. The number of ticks that the initial synchronization phase can take. The default is 10
  • zookeeper.property.syncLimit – Property from ZooKeeper’s config zoo.cfg. The number of ticks that can pass between sending a request and getting acknowledgment.The default is 5
  • zookeeper.property.dataDir – Property from ZooKeeper’s config zoo.cfg. The directory where the snapshot is stored. The default is ${hbase.tmp.dir}/zookeeper
  • zookeeper.property.clientPort – Property from ZooKeeper’s config zoo.cfg. The port at which the clients will connect. The default is 2181
  • zookeeper.property.maxClientCnxns – Property from ZooKeeper’s config zoo.cfg. Limit on number of concurrent connections (at the socket level) that a single client, identified by IP address, may make to a single member of the ZooKeeper ensemble. Set high to avoid zk connection issues running standalone and pseudo-distributed.The default is 300
  • client.write.buffer – Default size of the HTable client write buffer in bytes. A bigger buffer takes more memory — on both the client and server side since server instantiates the passed write buffer to process it — but a larger buffer size reduces the number of RPCs made. For an estimate of server-side memory-used, evaluate hbase.client.write.buffer * hbase.regionserver.handler.count. The default is 2097152
  • datanode.failed.volumes.tolerated – This is the “…number of volumes that are allowed to fail before a DataNode stops offering service. By default any volume failure will cause a datanode to shutdown” from the hdfs-default.xml description. You might want to set this to about half the amount of your available disks.
  • regionserver.handler.count – This setting defines the number of threads that are kept open to answer incoming requests to user tables. The rule of thumb is to keep this number low when the payload per request approaches the MB (big puts, scans using a large cache) and high when the payload is small (gets, small puts, ICVs, deletes). The total size of the queries in progress is limited by the setting hbase.ipc.server.max.callqueue.size. It is safe to set that number to the maximum number of incoming clients if their payload is small, the typical example being a cluster that serves a website since puts aren’t typically buffered and most of the operations are gets.
  • tmp.dir – Temporary directory on the local filesystem. Change this setting to point to a location more permanent than ‘/tmp’, the usual resolve for java.io.tmpdir, as the ‘/tmp’ directory is cleared on machine restart. The default is ${java.io.tmpdir}/hbase-${user.name}
  • rootdir – The directory shared by region servers and into which HBase persists. The URL should be ‘fully-qualified’ to include the filesystem scheme. For example, to specify the HDFS directory ‘/hbase’ where the HDFS instance’s namenode is running at namenode.example.org on port 9000, set this value to: hdfs://namenode.example.org:9000/hbase. By default, we write to whatever ${hbase.tmp.dir} is set too — usually /tmp — so change this configuration or else all data will be lost on machine restart. The default is ${hbase.tmp.dir}/hbase
  • master.port – The port the HBase Master should bind to. The default is 16000
  • master.logcleaner.ttl – Maximum time a WAL can stay in the .oldlogdir directory, after which it will be cleaned by a Master thread. The default is 600000
  • master.infoserver.redirect – Whether or not the Master listens to the Master web UI port (hbase.master.info.port) and redirects requests to the web UI server shared by the Master and RegionServer.The default is true
  • regionserver.port – The port the HBase RegionServer binds to.The default is 16020
  • regionserver.msginterval – Interval between messages from the RegionServer to Master in milliseconds. The default is 3000
  • regionserver.global.memstore.size – Maximum size of all memstores in a region server before new updates are blocked and flushes are forced. Defaults to 40% of heap (0.4). Updates are blocked and flushes are forced until size of all memstores in a region server hits hbase.regionserver.global.memstore.size.lower.limit. The default value in this configuration has been intentionally left empty in order to honor the old hbase.regionserver.global.memstore.upperLimit property if present. The default is none
  • session.timeout – ZooKeeper session timeout in milliseconds. It is used in two different ways. First, this value is used in the ZK client that HBase uses to connect to the ensemble. It is also used by HBase when it starts a ZK server and it is passed as the ‘maxSessionTimeout’. For example, if an HBase region server connects to a ZK ensemble that’s also managed by HBase, then the session timeout will be the one specified by this configuration. But, a region server that connects to an ensemble managed with a different configuration will be subjected that ensemble’s maxSessionTimeout. So, even though HBase might propose using 90 seconds, the ensemble can have a max timeout lower than this and it will take precedence. The current default that ZK ships with is 40 seconds, which is lower than HBase’s. The default is 90000
  • client.max.total.tasks – The maximum number of concurrent tasks a single HTable instance will send to the cluster.The default is 100
  • client.max.perserver.tasks – The maximum number of concurrent tasks a single HTable instance will send to a single region server. The default is 5
Share this post
[social_warfare]
HBase Installation
HBase Schema Design

Get industry recognized certification – Contact us

keyboard_arrow_up