HBase

HBase is a column-oriented database management system that runs on top of Hadoop Distributed File System (HDFS). We have complied a list of Interview Questions for HBase Professionals to help them in their job Interviews.

Q.1 What is Hbase?
Hbase is a column-oriented database management method that runs on top of the Hadoop Distribute File System (HDFS). Hbase is not a relational data repository, and it seems not to support structured query semantics like SQL. Also, in Hbase, a master node manages the cluster and region servers to save portions of the tables and operates on the data.
Q.2 Explain HBaseFsck class.
The tool called hbck is accessible in HBase, which is executed by the HBaseFsck class. Primarily, it extends many command-line switches that affect its behavior.
Q.3 Can you list some operational commands which are accessible in Hbase?
Yes, the operational commands I know is: Put Scan Delete Get last is Increment
Q.4 Where do we use Hbase?
We use Hbase for many purposes such as: The large-capacity storage system Column-Oriented Stores Horizontally Balance Great performance & Availability The base purpose of Hbase is millions of columns, billions of rows, and thousands of versions. Unlike HDFS, it holds arbitrary real-time CRUD operations
Q.5 What are some Hallmark Features of HBase?
The Hallmark Features of HBase are: Schema Flexibility Scalability High Reliability
Q.6 What do you mean by Thrift?
In C++, Apache Thrift is inscribed, but for various programming languages, it allows schema compilers, which include Perl, Java, Python, C++, PHP, Ruby, and more.
Q.7 Can we create an HBase table without assigning a column family?
No, the column family also impacts how the data must be saved physically in the HDFS file system, therefore there is an order that we should always have a minimum of one, column family. Also, we can adjust the column families once the table is built.
Q.8 Tell us the key components of Hbase.
The key components that I know are: Zookeeper: It does the coordination work among Hbase Maser and the client. Hbase Master: This observes the Region Server. RegionServer: It monitors the Region. Region: It includes an in-memory data store(MemStore) and Hfile. Catalog Tables: This consists of META and ROOT.
Q.9 Define REST.
Rest describes the interpretation so that we can practice the protocol in a general way to approach remote resources. Also to interact with the server, it supports various message formats, giving many options for a client application.
Q.10 Which data type will you use to collect the data in the HBase table column?
Byte Array, Put p = new Put(Bytes.toBytes("John Smith")); All the data in the HBase is saved as raw byte Array (10101010). Now the put instance is generated which can be included in the HBase users table.
Q.11 What do you understand by WAL and Hlog in Hbase?
(Write Ahead Log) WAL is alike to MySQL BIN log; it lists all the modifications that happen in data. It is a conventional sequence file by the Hadoop and it saves HLogkey’s. Also, these keys include a persistent number, as well as actual data and, are utilized to replay not yet endured data after a server wreck. So, in case of server failure, WAL operates as a lifeline and recovers the lost data.
Q.12 According to you, what would be the greatest intentions to favor Hbase as the DBMS?
One of the greatest things about Hbase is it is scalable in all features and modules. The users can simply make certain of providing to a very large amount of tables in a brief time period. In addition, it has broad support possible for all CRUD operations. It is able to save more data and can achieve the same utterly. Also, there are a large number of rows and columns accessible that enable users to step the pace up all the time.
Q.13 Which three coordinates are used to find the HBase data cell?

Generally, HBase practices the coordinates to find a portion of data within a table. The RowKey is the 1st coordinate. The three coordinates represent the location of the cell.

1. RowKey

2. Column Family (Group of columns)

3. Column Qualifier (Name of the columns or column itself e.g. Name, Email, Address)

Q.14 When do we require to disable a table in Hbase?
In Hbase, a table is disabled to enable it to be changed or modify its settings. When a table is disabled it cannot be obtained by the scan command.
Q.15 Name the tombstone markers that are there in the HBase.
There is a sum of 3 tombstone markers that we can analyze anytime. They are Version delete Column Delete Family delete
Q.16 Define Nagios.
Nagios is a very common support mechanism for expanding qualitative data regarding cluster status. On a daily basis, it polls popular metrics and also contrasts them with given thresholds.
Q.17 Tell us about the compaction in HBase.
Fundamentally, a process is practiced to join the Hfiles into 1 file, and after the merging file is generated and the then old file is removed this is the method of Compaction.
Q.18 What do you think will happens if you change the block size of a column family on an already utilized database?
When we change the block size of the column family, the original data utilizes the new block size while the old data continues within the old block size. While data compaction, old data will exercise the new block size. New files as they are cleaned, have a new block size whereas current data will proceed to be shown correctly. All data should be converted to the new block size, after the next compaction.
Q.19 Explain the use of HColumnDescriptor class.
The information regarding a column family such as the number of versions, compression settings, etc, saves in the HColumnDescriptor.
Q.20 Tell us some basic applications of HBase.
Some applications of the HBase that I know are: For write-heavy applications, we can practice Apache HBase. Furthermore, for quick random access to ready data, HBase is a great choice. And businesses, like Facebook, Twitter, Yahoo, and Adobe, etc. are practicing HBase internally.
Q.21 What do you mean by the CAP theorem and which highlights of the CAP theorem does HBase support?
CAP or Consistency, Availability, and Partition Tolerance. Consistency –At a provided point in time, all nodes in a cluster will be capable to view the same set of data. Availability- Each request creates a response, regardless of it is a success or a failure. Partition Tolerance – The system proceeds to work even if there is a failure of part of the system or intermittent message failure. HBase is a column-oriented database implementing features such as partition tolerance and consistency.
Q.22 Explain standalone mode in the HBase.
Standalone mode is a default mode of HBase. In this mode, HBase does not practice HDFS—it practices the local filesystem and it operates all HBase daemons and a local ZooKeeper in the same JVM method.
Q.23 Can you describe data versioning?
In enhancement to being a schema-less database, the HBase is also versioned. Every time we conduct an action on a cell, HBase essentially saves a new version. Building, modifying, and deleting a cell are all administered identically, they are completely new versions. When a cell surpasses the maximum number of versions, the additional records are released during the significant compaction. Rather than deleting a whole cell, we can work on a particular version within that cell. Values in a cell are versioned and are known as the timestamp. If a version is not specified, then the prevailing timestamp is utilized to retrieve the version. The default amount of cell versions is 3.
Q.24 Tell us some situations when you will acknowledge HBase?
I will opt for HBase when there is a requirement to shift a whole database. In addition, during the data processes which are huge to handle. Moreover, when there are a number of peculiarities like inner joins and transactions preservation that require to be used regularly, the Hbase can be considered simply.
Q.25 Is Hbase a Scale-Out Or Scale-Up method?
Hbase works on top of Hadoop which is a distributed system. Hadoop can only balance up as and when needed by annexing more machines on the fly. So Hbase is a scale-out method.
Q.26 What do you understand by MemStore?
The MemStore is a write barrier where HBase stores data in memory before a changeless write. Its contents are sprayed to the disk to make an HFile when the MemStore stretches up. It doesn’t correspond to a current HFile but rather produces a new file on each flush. There is 1 MemStore per column family.
Q.27 How can you assume that the Hbase is competent to allow high availability?
There is a unique feature called region replication. There are many replicas prepared that represent the whole region in a table. It is considered to be the load balancer in the Hbase that completely makes certain that the replicas are not received again and again in the servers with alike regions. This is precisely what makes certain of the high availability of Hbase.
Q.28 Tell us the steps in writing something Into Hbase by a Client?
In Hbase, the client does not draft straight into the HFile. The client initial writes to (Write Access Log) WAL, which later is obtained by the Memstore. Then, the Memstore Douses the data into lasting memory from time to time.
Q.29 According to you, what is the importance of Data management?
Frequently, organizations have to operate with bulk data. When the same is managed or structured, it is straightforward to utilize or to extend for any task. Of course, it cut down the whole time period needed to perform a task if it is well-managed. The users are free to keep up the speed only with the structured or properly managed data. Further, there are many other causes too that matter and always let the users ensure error-free outcomes.
Q.30 Explain the work of MasterServer?
Master sever supports us to allocate a region to the region server. Also, it assists us to handle the load balancing.
Q.31 Is Hbase an OS-independent strategy?
Yes, it is completely independent of the OS (operating system) and the users are free to view it on Linux, Windows, Unix, etc. the primary basic condition is it must have Java support established on it.
Q.32 Explain Bloom filter.
HBase holds Bloom Filter which helps us to increase the whole throughput of the cluster. Further, an HBase Bloom Filter is a space-efficient device to examine whether an HFile involves a certain row or row-col cell.
Q.33 Is it reasonable to iterate over the rows of the HBase table in the opposite order?
No, Column values are placed on disk and the range of the value is recorded first and then the original value is recorded. To iterate by these values in catastrophe order-the bytes of the original value should be written double.
Q.34 Which technique can we practice in HBase to obtain HFile straight without the assistance of HBase?
To access HFile straight without practicing HBase, we should practice HFile.main() process.
Q.35 How can we make certain of the logical grouping of the cells in the HBase?
This can be confirmed by giving attention to the Row key. The users are available to make certain that all the cells with related row keys can be found to each other and have a residence on an alike server. If the requirement for defining is effectuated, the Row key can be granted.
Q.36 While examining the data from the HBase, from which 3 places data will be resigned before returning the value?
Reading a row from HBase needs first examining the MemStore for any awaiting modifications. Then the BlockCache is checked to see if the block including this row has been accessed. Finally, the pertinent HFiles on the disk are obtained. In order to understand a comprehensive row, HBase must read beyond all HFiles that might incorporate information for that row in order to compile the complete record.
Q.37 Assume that your data is collected in collections, for instance, some binary data, message data, or metadata is all keyed on the corresponding value. Will you practice HBase for this?
Yes, it is perfect to practice HBase whenever key-based access to data is needed for storing and recovering.
Q.38 Explain LZO?
(LZO) Lempel-Ziv-Oberhumer is a lossless data compaction algorithm that is concentrated on decompression speed and composed in ANSIC.
Q.39 What is the most useful practice for determining the number of column families for the HBase table?
It is perfect not to pass the amount of columns families per HBase table by 15 due to each column family in HBase is saved as a single file, so a large amount of columns families will be needed to read and absorb multiple files.
Q.40 What is the major advantage of using Hbase?

The major advantages of using Hbase.

1. Horizontally scalable

2. Can store large data sets on top of HDFS file storage and will aggregate and analyze billions of rows present in the HBase tables.

3. In HBase, the database can be shared 4. Random read and write operations

Q.41 Can the region server will be placed on all DataNodes?
Yes, Region Servers operate on the same servers as DataNodes.
Q.42 What is the default ordering of data pairs in Hbase?
Lexicographical is the default ordering of data pairs in Hbase. 
Q.43 Define tombstone record.
The Delete command doesn’t remove the value directly. Rather, it indicates the record for deletion. That’s, a new “tombstone” record is recorded for that value, considering it as deleted. The tombstone is practiced to suggest that the removed value must no longer be enclosed in Get or Scan results.
Q.44 What is the major advantage of sharding?
The major advantage of sharding is Faster data writes.
Q.45 You might have previously used a relational database, can you tell us some major differences you marked in it as contrasted to HBase?
Well, the primary difference is that Hbase is not settled on schema whereas relation database is. Also, automated partitioning can simply be made in Hbase while relational databases require this feature. There are more major tables in Hbase than in the relational database. Furthermore, it is a row-oriented information store. On the other hand, Hbase is a column-oriented data store.
Q.46 What is the default time of major compaction of all HStoreFiles?
8 hours
Q.47 Can major compaction manually trigger?
Major compactions can be triggered (or a particular region) physically from the shell. This is a comparatively costly operation and isn’t arranged often. Minor compactions, on the opposite hand, are comparatively lightweight and appear more commonly.
Q.48 What is the default log level of Hbase processes?
DEBUG is the default log level of Hbase processes
Q.49 Explain TTL in Hbase.
It is essentially a method that is helpful when it becomes to data retention. It is desirable for the users to save the version of a cell for a limited time period. The equivalent gets deleted automatically upon the conclusion of such a time.
Q.50 Which method or element is effective for maintaining the HBase RegionServer?
HMaster is the execution of the Master Server. This is accountable for observing all RegionServer instances in the cluster and is the user interface for all metadata modifications. In a distributed cluster, the Master runs on the NameNode.
Q.51 Tell us the usage of "HColumnDescriptor "?
An HColumnDescriptor includes information regarding a column family like compression settings, the number of versions, etc. It is utilized as input when making a table or alluding to a column. Once set, the parameters that determine a column cannot be modified without removing the column and remaking it. If there is data saved in the column, it will be removed when the column is removed.
Q.52 Tell us the Java Code snippet to initiate a connection in Hbase?
If we are moving to open connection with the guidance of Java API. The subsequent code provides the connection: Configuration myConf = HBaseConfiguration.create(); HTableInterface usersTable = new HTable(myConf, "users");
Q.53 Why Multiwal is required?
With a separate WAL per RegionServer, the RegionServer must draft to the WAL systematically, because HDFS files should be sequential. This makes the WAL to be a production bottleneck.
Q.54 In which situation should we estimate making a wide and short Hbase table?
The short and wide table design is acknowledged when there is There is a little number of columns There is a huge number of rows
Q.55 What are the various Block Caches in Hbase?
HBase gives 2 separate BlockCache implementations: the default on-heap LruBlockCache and the BucketCache, which is (normally) off-heap.
Get Govt. Certified Take Test