Cloudera Certified Developer for Apache Hadoop Syllabus exam topics and contents (CCDH)

Cloudera Certified Developer for Apache Hadoop (CCDH)

Update : 6 April 2013

Cloudera has added exam learning resources on the website , please read this link for latest.

http://university.cloudera.com/certification/prep/ccdh.html

http://jugnu-life.blogspot.in/2012/05/cloudera-hadoop-certification-now.html

Syllabus , exam contents

http://university.cloudera.com/certification.html

To earn a CCDH certification, candidates must pass an exam designed to test a candidate’s fluency with the concepts and skills required in the following areas:

If you are interested in Administrator exam then you should read other post

http://jugnu-life.blogspot.in/2012/03/cloudera-certified-administrator-for.html

 

Exam syllabus for Developer and Study sources are mentioned below.

1. Core Hadoop Concepts (CCD-410:25% | CCD-470: 33%)

Objectives
  • Recognize and identify Apache Hadoop daemons and how they function both in data storage and processing under both CDH3 and CDH4.
  • Understand how Apache Hadoop exploits data locality, including rack placement policy.
  • Given a big data scenario, determine the challenges to large-scale computational models and how distributed systems attempt to overcome various challenges posed by the scenario.
  • Identify the role and use of both MapReduce v1 (MRv1) and MapReduce v2 (MRv2 / YARN) daemons.
Section Study Resources

 

2. Storing Files in Hadoop (7%)

Objectives
  • Analyze the benefits and challenges of the HDFS architecture
  • Analyze how HDFS implements file sizes, block sizes, and block abstraction.
  • Understand default replication values and storage requirements for replication.
  • Determine how HDFS stores, reads, and writes files.
  • Given a sample architecture, determine how HDFS handles hardware failure.
Section Study Resources
  • Hadoop: The Definitive Guide, 3rd edition: Chapter 3
  • Hadoop Operations: Chapter 2
  • Hadoop in Practice: Appendix C: HDFS Dissected

3. Job Configuration and Submission (7%)

Objectives
  • Construct proper job configuration parameters
  • Identify the correct procedures for MapReduce job submission.
  • How to use various commands in job submission
Section Study Resources
  • Hadoop: The Definitive Guide, 3rd Edition: Chapter 5

4. Job Execution Environment (10%)

Objectives
  • Given a MapReduce job, determine the lifecycle of a Mapper and the lifecycle of a Reducer.
  • Understand the key fault tolerance principles at work in a MapReduce job.
  • Identify the role of Apache Hadoop Classes, Interfaces, and Methods.
  • Understand how speculative execution exploits differences in machine configurations and capabilities in a parallel environment and how and when it runs.
Section Study Resources
  • Hadoop in Action: Chapter 3
  • Hadoop: The Definitive Guide, 3rd Edition: Chapter 6

5. Input and Output (6%)

Objectives
  • Given a sample job, analyze and determine the correct InputFormat and OutputFormat to select based on job requirements.
  • Understand the role of the RecordReader, and of sequence files and compression.
Section Study Resources
  • Hadoop: The Definitive Guide, 3rd Edition: Chapter 7
  • Hadoop in Action: Chapter 3
  • Hadoop in Practice: Chapter 3

6. Job Lifecycle (18%)

Objectives
  • Analyze the order of operations in a MapReduce job.
  • Analyze how data moves through a job.
  • Understand how partitioners and combiners function, and recognize appropriate use cases for each.
  • Recognize the processes and role of the the sort and shuffle process.
Section Study Resources
  • Hadoop: The Definitive Guide, 3rd Edition: Chapter 6
  • Hadoop in Practice: Techniques in section 6.4
Two blog posts from Philippe Adjiman’s Hadoop Tutorial Series

7. Data processing (6%)

Objectives
  • Analyze and determine the relationship of input keys to output keys in terms of both type and number, the sorting of keys, and the sorting of values.
  • Given sample input data, identify the number, type, and value of emitted keys and values from the Mappers as well as the emitted data from each Reducer and the number and contents of the output file(s).
Section Study Resources
  • Hadoop: The Definitive Guide, 3rd Edition: Chapter 7 on Input Formats and Output Formats
  • Hadoop in Practice: Chapter 3

8. Key and Value Types (6%)

Objectives
  • Given a scenario, analyze and determine which of Hadoop’s data types for keys and values are appropriate for the job.
  • Understand common key and value types in the MapReduce framework and the interfaces they implement.
Section Study Resources
  • Hadoop: The Definitive Guide, 3rd Edition: Chapter 4
  • Hadoop in Practice: Chapter 3

9. Common Algorithms and Design Patterns (7%)

Objectives
  • Evaluate whether an algorithm is well-suited for expression in MapReduce.
  • Understand implementation and limitations and strategies for joining datasets in MapReduce.
  • Analyze the role of DistributedCache and Counters.
Section Study Resources
  • Hadoop: The Definitive Guide, 3rd Edition: Chapter 8
  • Hadoop in Practice: Chapter 4, 5, 7
  • MapReduce Algorithms tutorial video. Note: uses the old API.
  • Hadoop in Action: Chapter 5.2

10. The Hadoop Ecosystem (8%)

Objectives
  • Analyze a workflow scenario and determine how and when to leverage ecosystems projects, including Apache Hive, Apache Pig, Sqoop and Oozie.
  • Understand how Hadoop Streaming might apply to a job workflow.
Section Study Resources