Test Name: Cloudera Certified Specialist in Apache HBase
Current Version: CCB-400
Number of Questions: 45
Time Limit: 90 minutes
Passing Score: 69%
Languages: English, Japanese
Core HBase Concepts
Recognize the fundamental characteristics of Apache HBase and its role in a big data ecosystem. Identify differences between Apache HBase and a traditional RDBMS. Describe the relationship between Apache HBase and HDFS. Given a scenario, identify application characteristics that make the scenario an appropriate application for Apache HBase.
Data Model
Describe how an Apache HBase table is physically stored on disk. Identify the differences between a Column Family and a Column Qualifier. Given a data loading scenario, identify how Apache HBase will version the rows. Describe how Apache HBase cells store data. Detail what happens to data when it is deleted.
Architecture
Identify the major components of an Apache HBase cluster. Recognize how regions work and their benefits under various scenarios. Describe how a client finds a row in an HBase table. Understand the function and purpose of minor and major compactions. Given a region server crash scenario, describe how Apache HBase fails over to another region server. Describe RegionServer splits.
Schema Design
Describe the factors to be considered with creating Column Families. Given an access pattern, define the row keys for optimal read performance. Given an access pattern, define the row keys for locality.
API
Describe the functions and purpose of the HBaseAdmin class. Given a table and rowkey, use the get() operation to return specific versions of that row. Describe the behavior of the checkAndPut() method.
Administration
Recognize how to create, describe, and access data in tables from the shell. Describe how to bulk load data into Apache HBase. Recognize the benefits of managed region splits.
Sample Questions
Question 1
You want to store the comments from a blog post in HBase. Your data consists of the following:
a. the blog post id
b. the name of the comment author
c. the body of the comment
d. the timestamp for each comment
Which rowkey would you use if you wanted to retrieve the comments from a scan with the most recent first?
A. <(Long)timestamp>
B. <blog_post_id><Long.MAX_VALUE – (Long)timestamp>
C. <timestamp><Long.MAX_VALUE>
D. <Long.MAX_VALUE><timestamp>
Question 2
Your application needs to retrieve 200 to 300 non-sequential rows from a table with one billion rows. You know the rowkey of each of the rows you need to retrieve. Which does your application need to implement?
A. Scan without range
B. Scan with start and stop row
C. HTable.get(Get get)
D. HTable.get(List<Get> gets)
Question 3
You perform a check and put operation from within an HBase application using the following:
table.checkAndPut(Bytes.toBytes("rowkey"),
Bytes.toBytes("colfam"),
Bytes.toBytes("qualifier"),
Bytes.toBytes("barvalue"), newrow));
Which describes this check and put operation?
A. Check if rowkey/colfam/qualifier exists and the cell value "barvalue" is equal to newrow. Then return “true”.
B. Check if rowkey/colfam/qualifier and the cell value "barvalue" is NOT equal to newrow. Then return “true”.
C. Check if rowkey/colfam/qualifier and has the cell value "barvalue". If so, put the values in newrow and return “false”.
D. Check if rowkey/colfam/qualifier and has the cell value "barvalue". If so, put the values in newrow and return “true”.
Question 4
What is the advantage of the using the bulk load API over doing individual Puts for bulk insert operations?
A.Writes bypass the HLog/MemStore reducing load on the RegionServer.
B.Users doing bulk Writes may disable writing to the WAL which results in possible data loss.
C.HFiles created by the bulk load API are guaranteed to be co-located with the RegionServer hosting the region.
D.HFiles written out via the bulk load API are more space efficient than those written out of RegionServers.
Question 5
You have a “WebLog” table in HBase. The Row Keys are the IP Addresses. You want to retrieve all entries that have an IP Address of 75.67.12.146. The shell command you would use is:
A. get 'WebLog', '75.67.21.146'
B. scan 'WebLog', '75.67.21.146'
C. get 'WebLog', {FILTER => '75.67.21.146'}
D. scan 'WebLog', {COLFAM => 'IP', FILTER => '75.67.12.146'}
Answers
Question 1: B
Question 2: D
Question 3: D
Question 4: A
Question 5: A