Jugnu Life :-): Cloudera Certified Administrator for Apache Hadoop (CCAH) exam topics and syllabus

Update : 6 April 2013

Cloudera Certified Administrator for Apache Hadoop (CCAH)

To earn a CCAH certification, candidates must pass an exam designed to test a candidate’s fluency with the concepts and skills required in the following areas:

If you are interested in Developer exam then you should read other post

http://jugnu-life.blogspot.in/2012/03/cloudera-certified-developer-for-apache.html

Details for Admin exam are here along with from where to prepare

Test Name: Cloudera Certified Administrator for Apache Hadoop CDH4 (CCA-410)
Number of Questions: 60
Time Limit: 90 minutes
Passing Score: 70%
Languages: English, Japanese
English Release Date: November 1, 2012
Japanese Release Date: December 1, 2012
Price: USD $295, AUD285, EUR225, GBP185, JPY25,500

1. HDFS (38%)

Objectives

Describe the function of all Hadoop Daemons
Describe the normal operation of an Apache Hadoop cluster, both in data storage and in data processing.
Identify current features of computing systems that motivate a system like Apache Hadoop.
Classify major goals of HDFS Design
Given a scenario, identify appropriate use case for HDFS Federation
Identify components and daemon of an HDFS HA-Quorum cluster
Analyze the role of HDFS security (Kerberos)
Describe file read and write paths

Section Study Resources

Hadoop: The Definitive Guide, 3rd edition: Chapter 3
Hadoop Operations: Chapter 2
Hadoop in Practice: Appendix C: HDFS Dissected
CDH4 High Availability Guide
CDH4 HA with Quorum-based storage docs
Apache HDFS High Availability Using the Quorum Journal Manager docs

2. MapReduce (10%)

Objectives

Understand how to deploy MapReduce MapReduce v1 (MRv1)
Understand how to deploy MapReduce v2 (MRv2 / YARN)
Understand basic design strategy for MapReduce v2 (MRv2)

Section Study Resources

Apache YARN docs (note: we don't control apache.org links and as of 11 February 2013, they have been experiencing downtime. You may get a 404 error.)

CDH4 YARN deployment docs

3. Hadoop Cluster Planning (12%)

Objectives

Principal points to consider in choosing the hardware and operating systems to host an Apache Hadoop cluster.
Analyze the choices in selecting an OS
Understand kernel tuning and disk swapping
Given a scenario and workload pattern, identify a hardware configuration appropriate to the scenario
Cluster sizing: given a scenario and frequency of execution, identify the specifics for the workload, including CPU, memory, storage, disk I/O
Disk Sizing and Configuration, including JBOD versus RAID, SANs, virtualization, and disk sizing requirements in a cluster
Network Topologies: understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario

Section Study Resources

Hadoop Operations: Chapter 4

4. Hadoop Cluster Installation and Administration (17%)

Objectives

Given a scenario, identify how the cluster will handle disk and machine failures.
Analyze a logging configuration and logging configuration file format.
Understand the basics of Hadoop metrics and cluster health monitoring.
Identify the function and purpose of available tools for cluster monitoring.
Identify the function and purpose of available tools for managing the Apache Hadoop file system.

Section Study Resources

Hadoop Operations, Chapter 5

5. Resource Management (6%)

Objectives

Understand the overall design goals of each of Hadoop schedulers.
Understand the role of HDFS quotas.
Given a scenario, determine how the FIFO Scheduler allocates cluster resources.
Given a scenario, determine how the Fair Scheduler allocates cluster resources.
Given a scenario, determine how the Capacity Scheduler allocates cluster resources.

Section Study Resources

A slide deck from Matei Zaharia, developer of the Fair Scheduler
Hadoop Operations, Chapter 7 Capacity Scheduler Apache docs (note: we don't control apache.org links and as of 11 February 2013, they have been experiencing downtime. You may get a 404 error.)

6. Monitoring and Logging (12%)

Objectives

Understand the functions and features of Hadoop’s metric collection abilities
Analyze the NameNode and JobTracker Web UIs
Interpret a log4j configuration
Understand how to monitor the Hadoop Daemons
Identify and monitor CPU usage on master nodes
Describe how to monitor swap and memory allocation on all nodes
Identify how to view and manage Hadoop’s log files
Interpret a log file

Section Study Resources

7. The Hadoop Ecosystem (5%)

Objectives

Understand Ecosystem projects and what you need to do to deploy them on a cluster.

Section Study Resources

Hadoop: The Definitive Guide, 3rd Edition: Chapters 11, 12, 14, 15
Hadoop in Practice: Chapters 10, 11
Hadoop in Action: Chapters 10, 11
Apache Hive docs
Apache Pig docs
Introduction to Pig Video
Apache Sqoop docs site
Aaron Kimball on Sqoop at Hadoop World 2012
Cloudera Manager Online Training Video Series
Each project in the Hadoop ecosystem has at least one book devoted to it. The exam scope does not require deep knowledge of programming in Hive, Pig, Sqoop, Cloudera Manager, Flume, etc. rather how those projects contribute to an overall big data ecosystem.

5 comments:

Jyo GirlJuly 2, 2012 at 10:38 AM
Thank you! this was very useful, Planning to give this exam by August 3rd week.
alfredNovember 3, 2012 at 10:04 AM
This comment has been removed by a blog administrator.
KiranVidhyaJanuary 13, 2013 at 6:50 PM
Hi,

can someone tell me having this certification would help in our career ??

I have no experience in hadoop, but i am attracted towards BIG DATA and i really wanna pursue my career with BIG DATA.

Can someone help me please ?
sivaramakrishnaMarch 2, 2013 at 6:01 AM
This comment has been removed by the author.

Please share your views and comments below.

Thank You.