Jugnu Life :-): Cloudera Certified Developer for Apache Hadoop Syllabus exam topics and contents (CCDH)

Cloudera Certified Developer for Apache Hadoop Syllabus exam topics and contents (CCDH)

Cloudera Certified Developer for Apache Hadoop (CCDH)

Update : 6 April 2013

Cloudera has added exam learning resources on the website , please read this link for latest.

http://university.cloudera.com/certification/prep/ccdh.html

http://jugnu-life.blogspot.in/2012/05/cloudera-hadoop-certification-now.html

Syllabus , exam contents

http://university.cloudera.com/certification.html

To earn a CCDH certification, candidates must pass an exam designed to test a candidate’s fluency with the concepts and skills required in the following areas:

If you are interested in Administrator exam then you should read other post

http://jugnu-life.blogspot.in/2012/03/cloudera-certified-administrator-for.html

Exam syllabus for Developer and Study sources are mentioned below.

1. Core Hadoop Concepts (CCD-410:25% | CCD-470: 33%)

Objectives

Recognize and identify Apache Hadoop daemons and how they function both in data storage and processing under both CDH3 and CDH4.
Understand how Apache Hadoop exploits data locality, including rack placement policy.
Given a big data scenario, determine the challenges to large-scale computational models and how distributed systems attempt to overcome various challenges posed by the scenario.
Identify the role and use of both MapReduce v1 (MRv1) and MapReduce v2 (MRv2 / YARN) daemons.

Section Study Resources

Hadoop File System Shell Guide
Apache YARN docs
CDH4 YARN deployment docs
CDH4 update including MapReduce v2 (MRv2)
We offer a great section on YARN in the following video:
What’s New in CDH4? A Guide for Previous Attendees of Cloudera Administrator Training for Apache Hadoop

2. Storing Files in Hadoop (7%)

Objectives

Analyze the benefits and challenges of the HDFS architecture
Analyze how HDFS implements file sizes, block sizes, and block abstraction.
Understand default replication values and storage requirements for replication.
Determine how HDFS stores, reads, and writes files.
Given a sample architecture, determine how HDFS handles hardware failure.

Section Study Resources

Hadoop: The Definitive Guide, 3rd edition: Chapter 3
Hadoop Operations: Chapter 2
Hadoop in Practice: Appendix C: HDFS Dissected

3. Job Configuration and Submission (7%)

Objectives

Construct proper job configuration parameters
Identify the correct procedures for MapReduce job submission.
How to use various commands in job submission

Section Study Resources

Hadoop: The Definitive Guide, 3rd Edition: Chapter 5

4. Job Execution Environment (10%)

Objectives

Given a MapReduce job, determine the lifecycle of a Mapper and the lifecycle of a Reducer.
Understand the key fault tolerance principles at work in a MapReduce job.
Identify the role of Apache Hadoop Classes, Interfaces, and Methods.
Understand how speculative execution exploits differences in machine configurations and capabilities in a parallel environment and how and when it runs.

Section Study Resources

Hadoop in Action: Chapter 3
Hadoop: The Definitive Guide, 3rd Edition: Chapter 6

5. Input and Output (6%)

Objectives

Given a sample job, analyze and determine the correct InputFormat and OutputFormat to select based on job requirements.
Understand the role of the RecordReader, and of sequence files and compression.

Section Study Resources

Hadoop: The Definitive Guide, 3rd Edition: Chapter 7
Hadoop in Action: Chapter 3
Hadoop in Practice: Chapter 3

6. Job Lifecycle (18%)

Objectives

Analyze the order of operations in a MapReduce job.
Analyze how data moves through a job.
Understand how partitioners and combiners function, and recognize appropriate use cases for each.
Recognize the processes and role of the the sort and shuffle process.

Section Study Resources

Hadoop: The Definitive Guide, 3rd Edition: Chapter 6
Hadoop in Practice: Techniques in section 6.4

Two blog posts from Philippe Adjiman’s Hadoop Tutorial Series

7. Data processing (6%)

Objectives

Analyze and determine the relationship of input keys to output keys in terms of both type and number, the sorting of keys, and the sorting of values.
Given sample input data, identify the number, type, and value of emitted keys and values from the Mappers as well as the emitted data from each Reducer and the number and contents of the output file(s).

Section Study Resources

Hadoop: The Definitive Guide, 3rd Edition: Chapter 7 on Input Formats and Output Formats
Hadoop in Practice: Chapter 3

8. Key and Value Types (6%)

Objectives

Given a scenario, analyze and determine which of Hadoop’s data types for keys and values are appropriate for the job.
Understand common key and value types in the MapReduce framework and the interfaces they implement.

Section Study Resources

Hadoop: The Definitive Guide, 3rd Edition: Chapter 4
Hadoop in Practice: Chapter 3

9. Common Algorithms and Design Patterns (7%)

Objectives

Evaluate whether an algorithm is well-suited for expression in MapReduce.
Understand implementation and limitations and strategies for joining datasets in MapReduce.
Analyze the role of DistributedCache and Counters.

Section Study Resources

Hadoop: The Definitive Guide, 3rd Edition: Chapter 8
Hadoop in Practice: Chapter 4, 5, 7
MapReduce Algorithms tutorial video. Note: uses the old API.
Hadoop in Action: Chapter 5.2

10. The Hadoop Ecosystem (8%)

Objectives

Analyze a workflow scenario and determine how and when to leverage ecosystems projects, including Apache Hive, Apache Pig, Sqoop and Oozie.
Understand how Hadoop Streaming might apply to a job workflow.

Section Study Resources

Hadoop: The Definitive Guide, 3rd Edition: Chapters 11, 12, 14, 15
Hadoop in Practice: Chapters 10, 11
Hadoop in Action: Chapters 10, 11
Introduction to Apache Pig Video Tutorial Introduction to Apache Hive video tutorial
Apache Hive docs
Apache Pig docs
Introduction to Pig Video
Apache Sqoop docs
Aaron Kimball on Sqoop at Hadoop World 2012
Cloudera Manager Online Training Video Series
Each project in the Hadoop ecosystem has at least one book devoted to it. The exam scope does not require deep knowledge of programming in Hive, Pig, Sqoop, Cloudera Manager, Flume, etc. rather how those projects contribute to an overall big data ecosystem.

18 comments:

sameerApril 2, 2012 at 11:00 PM
hey I would like to join in. I tried to find your mail id but was unable to.
If you are still into hadoop plz connect with me at
sameersurjikar(at)gmail.com. I am just getting started. Will be of great help to have a person to share experience with :)
ReplyDelete
Replies
JugnuApril 2, 2012 at 11:55 PM
Hi Sameer

Great to hear from your side.

I have taken Hadoop Definitive guide as reference for preparation.

My target is to write it in May month only.

How you planning to prepare ?

I am emailing you my contact details.
ReplyDelete
Replies
SanjayApril 5, 2012 at 3:45 PM
Hi JJ,

Great blog, i would say. Though i have been working as a Senior project manager and with 17 years of IT Experience. Currently, I am working in a E-commerce company and spent large time managing project issues and risk and cost, emerging technology has always been an interesting area for me. I have been following GFS and BigTable for sometime 2 years ago but could not take it up further. Recently with tremendous opportunity in the Big Data Anlaysis space, Hadoop along with other contrib such as Hbase, Hive, Pig, Scoop, Cassendra, NoSQL and the count is endless with Karmasphere and Cloudera in the IDE space for these technology.

As like you, i would also like to appear in the Hadoop Developer training and certification (Cloudera Certification), please let me know more on this.

I have been following, Following book
1. Hadoop - A definitive Guide, Second volume
3. hadoop - A quick Introduction (IBM site)
4. Hadoop in Action - by Chuck Lam (Sample chapter 10 - Programmming in PIG)
5. Article on Big data Analysis by Ravi kalakota (Search in Google. He is excellent writer on Predictive analysis and other form of Big Data Analysis
6. Storm - The big data analysis tool from twitter.
7. Understaning Big Data - Analytics for Enterprise Class Hadoop & Streaming Data.

Please share the exam syllebus for the Cloudera exam at sanjaybsl@yahoo.com
ReplyDelete
Replies
JugnuApril 5, 2012 at 9:24 PM
Hello Sanjay

Thanks for your comment.

You having so much industry experience , would love to learn lot of things from you.

With Reference to Hadoop , you have already covered all the group preparation by reading those books.

I will also go through few links you mentioned.

Few days back i had talk with Cloudera people , they said just before start of VUE exams they would release few more details about exam.

Otherwise syllabus as i already mentioned would be enough

Hadoop Computing Environment
Hadoop Distributed File System
MapReduce
Hadoop API
Hadoop Ecosystem

Regards

Jagat Singh
ReplyDelete
Replies
BNMApril 29, 2012 at 10:18 PM
Cleared the exam on 27th April. One of the toughest exam I have ever faced.
ReplyDelete
Replies
JugnuApril 30, 2012 at 12:08 AM
Hello BNM

Congratulations :)

Any tips on how to prepare for the exam ?

In which way you say it was toughtest?

Thanks
ReplyDelete
Replies
EshwarJuly 17, 2012 at 7:25 AM
Hi,
Congratulations.
Can you tell me how to prepare for the exam ?
Thanks
Arjun
ReplyDelete
Replies
RajJuly 26, 2012 at 8:49 AM
Hi guys,
I am planning to go for it too, but just curious to know how will it help to go forward in career and how much weight does this certification carry w.r.t. the job opportunity in today's market. BTW I don't have any domain exp. of file systems/big data analytic, but interested to know and pursue further.
ReplyDelete
Replies
RoopaDecember 4, 2012 at 10:13 AM
I am planning to take Hadoop exam. Kindly someone provide inputs
ReplyDelete
Replies
RoopaDecember 4, 2012 at 10:18 AM
I had taken exam and am sure that i had answered most of them correctly. Still could not cleared the exam. Kindly let me know any tips or how to prepare for the exam
ReplyDelete
Replies
RoopaDecember 4, 2012 at 8:02 PM
Hi, i had read Hadoop definitive guide.I am pretty confident about my answers and am sure those are correct. I have even verified after exam. But not sure on what basis they are evaluating
ReplyDelete
Replies
JugnuDecember 28, 2012 at 1:27 PM
Hello Rohan ,

I would share the topics to study from which all sections with you soon.

When are you planning to write?

Thanks,

ReplyDelete
Replies
AleemJanuary 9, 2013 at 1:02 AM
Hi JJ,

I have a great interest in learn & make things with Hadoop. So kindly suggest me, which books/material should need to follow. Please share the syllabus, recommended links/material at aleem.btech@gmail.com

I just want to have complete understanding of Hadoop Indepth, your help in this regard will be highly appreciable.

Thanks,
Mohammed Aleem
ReplyDelete
Replies
Ankit khanduriJanuary 20, 2013 at 2:55 PM
Hi JJ,
Right now working as Linux Admin at one of the MNC,
Now i want to learn about Bigdata and Hadoop ,
Can u mail me the Study materials and Syllabus,
my email id is :erankitkhanduri@gmail.com
thaks
ankit khanduri
ReplyDelete
Replies
UnknownApril 4, 2013 at 8:45 AM
Hi JJ

Thanks for your blog.

I have one query . I would like to complete Hadoop certification .
Just to give you background of my knowledge :I worked on Apache Hadoop 1.0.4 , Sqoop, Oozie , Hive . I see lots of certification in Market from Cloudera , HortonWorks , Big Data university. Can you help me to decide which one would be relevant in my case. Thanks in advance.

---
Somi
ReplyDelete
Replies

Add comment

Please share your views and comments below.

Thank You.