While using Oozie workflows to import data to Cluster for load assurance we should make sure that workflow is importing data to cluster regularly.
For File based loads
After each workflow run use
boolean fs:dirSize(String path)
To see that workflow has imported something or not ?
If not shoot an email using Email action and logging them.
For Sqoop based loads
Since Sqoop based loads don't support counters yet ( Oozie 3.2)
See OOZIE-1012 Sqoop jobs are unable to utilize Hadoop Counters
Use the following to find if something new has been created or not
Get list of all files in Sqoop import directory. Check for the files with creation stamp greater than start of oozie job. And then see its size
Corresponding methods are
FileStatus stat = fileSys.getFileStatus(file1);
long ctime1 = stat.getCreationTime();
You can also count number of records imported and match with source using sqoop eval
If Sqoop job is importing files to new directory every time then using simple logic like dirSize is also enough.
Further logging them to some central database for tracking daily loads :)
Was it helpful ? You know better way ?
Share your thoughts below in comments , Thanks for reading