Cloudera Certified Developer for Apache Hadoop (CCDH)
Update : 6 April 2013
Cloudera has added exam learning resources on the website , please read this link for latest.
http://university.cloudera.com/certification/prep/ccdh.html
http://jugnu-life.blogspot.in/2012/05/cloudera-hadoop-certification-now.html
Syllabus , exam contents
http://university.cloudera.com/certification.html
To earn a CCDH certification, candidates must pass an exam designed to test a candidate’s fluency with the concepts and skills required in the following areas:
If you are interested in Administrator exam then you should read other post
http://jugnu-life.blogspot.in/2012/03/cloudera-certified-administrator-for.html
Exam syllabus for Developer and Study sources are mentioned below.
1. Core Hadoop Concepts (CCD-410:25% | CCD-470: 33%)
Objectives
- Recognize and identify Apache Hadoop daemons and how they function both in data storage and processing under both CDH3 and CDH4.
- Understand how Apache Hadoop exploits data locality, including rack placement policy.
- Given a big data scenario, determine the challenges to large-scale computational models and how distributed systems attempt to overcome various challenges posed by the scenario.
- Identify the role and use of both MapReduce v1 (MRv1) and MapReduce v2 (MRv2 / YARN) daemons.
Section Study Resources
- Hadoop File System Shell Guide
- Apache YARN docs
- CDH4 YARN deployment docs
- CDH4 update including MapReduce v2 (MRv2)
We offer a great section on YARN in the following video:
What’s New in CDH4? A Guide for Previous Attendees of Cloudera Administrator Training for Apache Hadoop
2. Storing Files in Hadoop (7%)
Objectives
- Analyze the benefits and challenges of the HDFS architecture
- Analyze how HDFS implements file sizes, block sizes, and block abstraction.
- Understand default replication values and storage requirements for replication.
- Determine how HDFS stores, reads, and writes files.
- Given a sample architecture, determine how HDFS handles hardware failure.
Section Study Resources
- Hadoop: The Definitive Guide, 3rd edition: Chapter 3
- Hadoop Operations: Chapter 2
- Hadoop in Practice: Appendix C: HDFS Dissected
3. Job Configuration and Submission (7%)
Objectives
- Construct proper job configuration parameters
- Identify the correct procedures for MapReduce job submission.
- How to use various commands in job submission
Section Study Resources
- Hadoop: The Definitive Guide, 3rd Edition: Chapter 5
4. Job Execution Environment (10%)
Objectives
- Given a MapReduce job, determine the lifecycle of a Mapper and the lifecycle of a Reducer.
- Understand the key fault tolerance principles at work in a MapReduce job.
- Identify the role of Apache Hadoop Classes, Interfaces, and Methods.
- Understand how speculative execution exploits differences in machine configurations and capabilities in a parallel environment and how and when it runs.
Section Study Resources
- Hadoop in Action: Chapter 3
- Hadoop: The Definitive Guide, 3rd Edition: Chapter 6
5. Input and Output (6%)
Objectives
- Given a sample job, analyze and determine the correct InputFormat and OutputFormat to select based on job requirements.
- Understand the role of the RecordReader, and of sequence files and compression.
Section Study Resources
- Hadoop: The Definitive Guide, 3rd Edition: Chapter 7
- Hadoop in Action: Chapter 3
- Hadoop in Practice: Chapter 3
6. Job Lifecycle (18%)
Objectives
- Analyze the order of operations in a MapReduce job.
- Analyze how data moves through a job.
- Understand how partitioners and combiners function, and recognize appropriate use cases for each.
- Recognize the processes and role of the the sort and shuffle process.
Section Study Resources
- Hadoop: The Definitive Guide, 3rd Edition: Chapter 6
- Hadoop in Practice: Techniques in section 6.4
7. Data processing (6%)
Objectives
- Analyze and determine the relationship of input keys to output keys in terms of both type and number, the sorting of keys, and the sorting of values.
- Given sample input data, identify the number, type, and value of emitted keys and values from the Mappers as well as the emitted data from each Reducer and the number and contents of the output file(s).
Section Study Resources
- Hadoop: The Definitive Guide, 3rd Edition: Chapter 7 on Input Formats and Output Formats
- Hadoop in Practice: Chapter 3
8. Key and Value Types (6%)
Objectives
- Given a scenario, analyze and determine which of Hadoop’s data types for keys and values are appropriate for the job.
- Understand common key and value types in the MapReduce framework and the interfaces they implement.
Section Study Resources
- Hadoop: The Definitive Guide, 3rd Edition: Chapter 4
- Hadoop in Practice: Chapter 3
9. Common Algorithms and Design Patterns (7%)
Objectives
- Evaluate whether an algorithm is well-suited for expression in MapReduce.
- Understand implementation and limitations and strategies for joining datasets in MapReduce.
- Analyze the role of DistributedCache and Counters.
Section Study Resources
- Hadoop: The Definitive Guide, 3rd Edition: Chapter 8
- Hadoop in Practice: Chapter 4, 5, 7
- MapReduce Algorithms tutorial video. Note: uses the old API.
- Hadoop in Action: Chapter 5.2
10. The Hadoop Ecosystem (8%)
Objectives
- Analyze a workflow scenario and determine how and when to leverage ecosystems projects, including Apache Hive, Apache Pig, Sqoop and Oozie.
- Understand how Hadoop Streaming might apply to a job workflow.
Section Study Resources
- Hadoop: The Definitive Guide, 3rd Edition: Chapters 11, 12, 14, 15
- Hadoop in Practice: Chapters 10, 11
- Hadoop in Action: Chapters 10, 11
- Introduction to Apache Pig Video Tutorial Introduction to Apache Hive video tutorial
- Apache Hive docs
- Apache Pig docs
- Introduction to Pig Video
- Apache Sqoop docs
- Aaron Kimball on Sqoop at Hadoop World 2012
- Cloudera Manager Online Training Video Series
- Each project in the Hadoop ecosystem has at least one book devoted to it. The exam scope does not require deep knowledge of programming in Hive, Pig, Sqoop, Cloudera Manager, Flume, etc. rather how those projects contribute to an overall big data ecosystem.
hey I would like to join in. I tried to find your mail id but was unable to.
ReplyDeleteIf you are still into hadoop plz connect with me at
sameersurjikar(at)gmail.com. I am just getting started. Will be of great help to have a person to share experience with :)
Hi Sameer
ReplyDeleteGreat to hear from your side.
I have taken Hadoop Definitive guide as reference for preparation.
My target is to write it in May month only.
How you planning to prepare ?
I am emailing you my contact details.
Hi JJ,
ReplyDeleteGreat blog, i would say. Though i have been working as a Senior project manager and with 17 years of IT Experience. Currently, I am working in a E-commerce company and spent large time managing project issues and risk and cost, emerging technology has always been an interesting area for me. I have been following GFS and BigTable for sometime 2 years ago but could not take it up further. Recently with tremendous opportunity in the Big Data Anlaysis space, Hadoop along with other contrib such as Hbase, Hive, Pig, Scoop, Cassendra, NoSQL and the count is endless with Karmasphere and Cloudera in the IDE space for these technology.
As like you, i would also like to appear in the Hadoop Developer training and certification (Cloudera Certification), please let me know more on this.
I have been following, Following book
1. Hadoop - A definitive Guide, Second volume
3. hadoop - A quick Introduction (IBM site)
4. Hadoop in Action - by Chuck Lam (Sample chapter 10 - Programmming in PIG)
5. Article on Big data Analysis by Ravi kalakota (Search in Google. He is excellent writer on Predictive analysis and other form of Big Data Analysis
6. Storm - The big data analysis tool from twitter.
7. Understaning Big Data - Analytics for Enterprise Class Hadoop & Streaming Data.
Please share the exam syllebus for the Cloudera exam at sanjaybsl@yahoo.com
Hello Sanjay
ReplyDeleteThanks for your comment.
You having so much industry experience , would love to learn lot of things from you.
With Reference to Hadoop , you have already covered all the group preparation by reading those books.
I will also go through few links you mentioned.
Few days back i had talk with Cloudera people , they said just before start of VUE exams they would release few more details about exam.
Otherwise syllabus as i already mentioned would be enough
Hadoop Computing Environment
Hadoop Distributed File System
MapReduce
Hadoop API
Hadoop Ecosystem
Regards
Jagat Singh
Cleared the exam on 27th April. One of the toughest exam I have ever faced.
ReplyDeleteHello BNM
ReplyDeleteCongratulations :)
Any tips on how to prepare for the exam ?
In which way you say it was toughtest?
Thanks
Hi,
ReplyDeleteCongratulations.
Can you tell me how to prepare for the exam ?
Thanks
Arjun
Hi guys,
ReplyDeleteI am planning to go for it too, but just curious to know how will it help to go forward in career and how much weight does this certification carry w.r.t. the job opportunity in today's market. BTW I don't have any domain exp. of file systems/big data analytic, but interested to know and pursue further.
I am planning to take Hadoop exam. Kindly someone provide inputs
ReplyDeleteI had taken exam and am sure that i had answered most of them correctly. Still could not cleared the exam. Kindly let me know any tips or how to prepare for the exam
ReplyDeleteHello Roopa ji,
DeleteNo worries better luck next time.You can check Cloudera website if they give next turn free or not?
Okay coming to how to prepare.
What have you read already ? I would reccomend you to read the Definitive guide book atleast 2 times. Since options are confusing and very similar so you might end up choosing wrong one if we are not through with the concepts. Follow the topics given in Syllabus and plan your schedule accordingly.
And dont worry for failure its very common :)
Good luck
Hi, i had read Hadoop definitive guide.I am pretty confident about my answers and am sure those are correct. I have even verified after exam. But not sure on what basis they are evaluating
ReplyDeleteHello Rohan ,
ReplyDeleteI would share the topics to study from which all sections with you soon.
When are you planning to write?
Thanks,
Hi JJ,
ReplyDeleteI have a great interest in learn & make things with Hadoop. So kindly suggest me, which books/material should need to follow. Please share the syllabus, recommended links/material at aleem.btech@gmail.com
I just want to have complete understanding of Hadoop Indepth, your help in this regard will be highly appreciable.
Thanks,
Mohammed Aleem
Hi JJ,
ReplyDeleteRight now working as Linux Admin at one of the MNC,
Now i want to learn about Bigdata and Hadoop ,
Can u mail me the Study materials and Syllabus,
my email id is :erankitkhanduri@gmail.com
thaks
ankit khanduri
Hello Ankit,
DeleteBeing Admin you are already half way to move towards Hadoop Adminstration.
I would suggest you to grab Hadoop Definitive guide and start reading about it. Dont worry for certification as of now , that you can clear easily.
Hi JJ
ReplyDeleteThanks for your blog.
I have one query . I would like to complete Hadoop certification .
Just to give you background of my knowledge :I worked on Apache Hadoop 1.0.4 , Sqoop, Oozie , Hive . I see lots of certification in Market from Cloudera , HortonWorks , Big Data university. Can you help me to decide which one would be relevant in my case. Thanks in advance.
---
Somi
Hi Somi,
DeleteFew questions i would ask to you.
Is your company sponsoring your certification fees?
Yes
Then go for any doesnt matter
No
Go for Hortonworks
Last but not least its knowledge which matter the most rather then certification. So just work hard and try to learn and dont worry for exam it would be cake walk when you give it.
Both Hortonworks and Cloudera have equal reputation.