Oozie data load assurance

While using Oozie workflows to import data to Cluster for load assurance we should make sure that workflow is importing data to cluster regularly.

For File based loads

After each workflow run use

boolean fs:dirSize(String path)

To see that workflow has imported something or not ?

If not shoot an email using Email action and logging them.

For Sqoop based loads

Since Sqoop based loads don't support counters yet ( Oozie 3.2)

See OOZIE-1012 Sqoop jobs are unable to utilize Hadoop Counters

Use the following to find if something new has been created or not

Get list of all files in Sqoop import directory. Check for the files with creation stamp greater than start of oozie job. And then see its size

Corresponding methods are

FileStatus stat = fileSys.getFileStatus(file1);
long ctime1 = stat.getCreationTime();

You can also count number of records imported and match with source using sqoop eval

If Sqoop job is importing files to new directory every time then using simple logic like dirSize is also enough.

Further logging them to some central database for tracking daily loads :)

------

Was it helpful ? You know better way ?

Share your thoughts below in comments , Thanks for reading

No comments:

Post a Comment

Please share your views and comments below.

Thank You.