Why my Hive Sqoop job is failing


Find few basics about cluster from your Administrator about cluster configuration.

Sample talk Example can be

How many nodes cluster has , what are its configuration

Answer can be

Each node has 120GB RAM . Out of that memory which we can ask for our jobs is about 80GB
We have 14 cpu cores in each datanode , we have 5 right now the maximum we can ask for processing from each datanode is 8 cores

Leaving rest for other processes like OS / Hadoop / Monitoring services

When we run any job in Mapreduce world you will get minimum RAM of 2GB and Max any task can ask for is 80GB ( See the capacity above)

Given the fact that we are running big jobs for one off loads please tell the system to give you higher RAM for your job. ( RAM in increments of 1024 MB)

Besides RAM you can also ask how many CPU cores you want.

The max cores which given node can provide is 8 for processing

This can be controlled via following parameters

Hive jobs

mapreduce.map.memory.mb
mapreduce.reduce.memory.mb
mapreduce.map.java.opts
mapreduce.reduce.java.opts
mapreduce.map.cpu.vcores
mapreduce.reduce.cpu.vcores

Typically if your jobs fails while inserting to hive query please see check if you need to tune any memory parameter. Hive insert jobs are reduce jobs.

Since you are inserting large amount of data in one go you will face the issues of memory overrun.

Always see the logs , its always mentioned there why the job is failing.

Sqoop jobs

Sqoop jobs spawn only map jobs.

So if Sqoop job is not moving through the indicator is following

Memory issue on our side.

So just add the following parameter in the code

-Dmapreduce.map.memory.mb=5120 -Dmapreduce.map.speculative=false

Tune the above 5120 parameter to based no need.


Where to see logs and job status

You can see whats the status of your job and logs at Resource manager

http://resource:8088/cluster

You can also login to Ambari to see what value has been set as default for given property

http://ambari:8080

Ask for Username and password with readonly access from your administrator

Find out what current default values are

mapreduce.map.java.opts=-Xmx5012m
mapreduce.reduce.java.opts=-Xmx6144m
mapreduce.map.memory.mb=4096
mapreduce.reduce.memory.mb=8192

What if , my job is not even accepted by the cluster :)

You are asking for resources which cluster don't have. Means its crossing the max limit of the cluster. So check with your job what its really asking for and what the cluster can provide.

Why is my job being killed ?

If your job is crossing the resource limit which it has originally asked for from the RM the Yarn will kill your job.

You will see something like below in logs

Killing container....

Remember Google is your friend

Moment you see your job has failed see the error from the above logs and search in Google , try to find which parameters people have suggested to change in the job.

Resources

https://altiscale.zendesk.com/hc/en-us/articles/200801519-Configuring-Memory-for-Mappers-and-Reducers-in-Hadoop-2



No comments:

Post a Comment

Please share your views and comments below.

Thank You.