Update : 6 April 2013
Cloudera Certified Administrator for Apache Hadoop (CCAH)
To earn a CCAH certification, candidates must pass an exam designed to test a candidate’s fluency with the concepts and skills required in the following areas:
If you are interested in Developer exam then you should read other post
http://jugnu-life.blogspot.in/2012/03/cloudera-certified-developer-for-apache.html
Details for Admin exam are here along with from where to prepare
Test Name: Cloudera Certified Administrator for Apache Hadoop CDH4 (CCA-410)
Number of Questions: 60
Time Limit: 90 minutes
Passing Score: 70%
Languages: English, Japanese
English Release Date: November 1, 2012
Japanese Release Date: December 1, 2012
Price: USD $295, AUD285, EUR225, GBP185, JPY25,500
1. HDFS (38%)
Objectives
- Describe the function of all Hadoop Daemons
- Describe the normal operation of an Apache Hadoop cluster, both in data storage and in data processing.
- Identify current features of computing systems that motivate a system like Apache Hadoop.
- Classify major goals of HDFS Design
- Given a scenario, identify appropriate use case for HDFS Federation
- Identify components and daemon of an HDFS HA-Quorum cluster
- Analyze the role of HDFS security (Kerberos)
- Describe file read and write paths
Section Study Resources
- Hadoop: The Definitive Guide, 3rd edition: Chapter 3
- Hadoop Operations: Chapter 2
- Hadoop in Practice: Appendix C: HDFS Dissected
- CDH4 High Availability Guide
- CDH4 HA with Quorum-based storage docs
- Apache HDFS High Availability Using the Quorum Journal Manager docs
2. MapReduce (10%)
Objectives
- Understand how to deploy MapReduce MapReduce v1 (MRv1)
- Understand how to deploy MapReduce v2 (MRv2 / YARN)
- Understand basic design strategy for MapReduce v2 (MRv2)
Section Study Resources
3. Hadoop Cluster Planning (12%)
Objectives
- Principal points to consider in choosing the hardware and operating systems to host an Apache Hadoop cluster.
- Analyze the choices in selecting an OS
- Understand kernel tuning and disk swapping
- Given a scenario and workload pattern, identify a hardware configuration appropriate to the scenario
- Cluster sizing: given a scenario and frequency of execution, identify the specifics for the workload, including CPU, memory, storage, disk I/O
- Disk Sizing and Configuration, including JBOD versus RAID, SANs, virtualization, and disk sizing requirements in a cluster
- Network Topologies: understand network usage in Hadoop (for both HDFS and MapReduce) and propose or identify key network design components for a given scenario
Section Study Resources
- Hadoop Operations: Chapter 4
4. Hadoop Cluster Installation and Administration (17%)
Objectives
- Given a scenario, identify how the cluster will handle disk and machine failures.
- Analyze a logging configuration and logging configuration file format.
- Understand the basics of Hadoop metrics and cluster health monitoring.
- Identify the function and purpose of available tools for cluster monitoring.
- Identify the function and purpose of available tools for managing the Apache Hadoop file system.
Section Study Resources
- Hadoop Operations, Chapter 5
5. Resource Management (6%)
Objectives
- Understand the overall design goals of each of Hadoop schedulers.
- Understand the role of HDFS quotas.
- Given a scenario, determine how the FIFO Scheduler allocates cluster resources.
- Given a scenario, determine how the Fair Scheduler allocates cluster resources.
- Given a scenario, determine how the Capacity Scheduler allocates cluster resources.
Section Study Resources
- A slide deck from Matei Zaharia, developer of the Fair Scheduler
- Hadoop Operations, Chapter 7 Capacity Scheduler Apache docs (note: we don't control apache.org links and as of 11 February 2013, they have been experiencing downtime. You may get a 404 error.)
6. Monitoring and Logging (12%)
Objectives
- Understand the functions and features of Hadoop’s metric collection abilities
- Analyze the NameNode and JobTracker Web UIs
- Interpret a log4j configuration
- Understand how to monitor the Hadoop Daemons
- Identify and monitor CPU usage on master nodes
- Describe how to monitor swap and memory allocation on all nodes
- Identify how to view and manage Hadoop’s log files
- Interpret a log file
Section Study Resources
7. The Hadoop Ecosystem (5%)
Objectives
- Understand Ecosystem projects and what you need to do to deploy them on a cluster.
Section Study Resources
- Hadoop: The Definitive Guide, 3rd Edition: Chapters 11, 12, 14, 15
- Hadoop in Practice: Chapters 10, 11
- Hadoop in Action: Chapters 10, 11
- Apache Hive docs
- Apache Pig docs
- Introduction to Pig Video
- Apache Sqoop docs site
- Aaron Kimball on Sqoop at Hadoop World 2012
- Cloudera Manager Online Training Video Series
- Each project in the Hadoop ecosystem has at least one book devoted to it. The exam scope does not require deep knowledge of programming in Hive, Pig, Sqoop, Cloudera Manager, Flume, etc. rather how those projects contribute to an overall big data ecosystem.
Thank you! this was very useful, Planning to give this exam by August 3rd week.
ReplyDeleteThis comment has been removed by a blog administrator.
ReplyDeleteHi,
ReplyDeletecan someone tell me having this certification would help in our career ??
I have no experience in hadoop, but i am attracted towards BIG DATA and i really wanna pursue my career with BIG DATA.
Can someone help me please ?
I would suggest you to start exploring about hadoop , what problems it solves and if it interests you then only think about certification.
DeleteRight now ahead to apache.org and start seeing its mailing list , documentation. Grab a book on Hadoop and start turning its pages.
You will enjoy for sure and when you enjoy what you do your career will automatically take care of itself.
This comment has been removed by the author.
ReplyDelete