Find few basics about cluster from your Administrator about cluster configuration.
Sample talk Example can be
How many nodes cluster has , what are its configuration
Answer can be
Each node has 120GB RAM . Out of that memory which we can ask for our jobs is about 80GB
We have 14 cpu cores in each datanode , we have 5 right now the maximum we can ask for processing from each datanode is 8 cores
Leaving rest for other processes like OS / Hadoop / Monitoring services
When we run any job in Mapreduce world you will get minimum RAM of 2GB and Max any task can ask for is 80GB ( See the capacity above)
Given the fact that we are running big jobs for one off loads please tell the system to give you higher RAM for your job. ( RAM in increments of 1024 MB)
Besides RAM you can also ask how many CPU cores you want.
The max cores which given node can provide is 8 for processing
This can be controlled via following parameters
Typically if your jobs fails while inserting to hive query please see check if you need to tune any memory parameter. Hive insert jobs are reduce jobs.
Since you are inserting large amount of data in one go you will face the issues of memory overrun.
Always see the logs , its always mentioned there why the job is failing.
Sqoop jobs spawn only map jobs.
So if Sqoop job is not moving through the indicator is following
Memory issue on our side.
So just add the following parameter in the code
Tune the above 5120 parameter to based no need.
Where to see logs and job status
You can see whats the status of your job and logs at Resource manager
You can also login to Ambari to see what value has been set as default for given property
Ask for Username and password with readonly access from your administrator
Find out what current default values are
What if , my job is not even accepted by the cluster :)
You are asking for resources which cluster don't have. Means its crossing the max limit of the cluster. So check with your job what its really asking for and what the cluster can provide.
Why is my job being killed ?
If your job is crossing the resource limit which it has originally asked for from the RM the Yarn will kill your job.
You will see something like below in logs
Remember Google is your friend
Moment you see your job has failed see the error from the above logs and search in Google , try to find which parameters people have suggested to change in the job.