Nagios Support Forum

Posted: **Tue Oct 04, 2016 8:32 pm**

Dear Team,

We are observing high load issue in our hadoop ubuntu machines. One of the hadoop services (Impala service) will put more load( like 50) on linux machine when it processing the more data and it will come to normal state once the activity completes. The challenge what we are facing here issue during that time nrpe agent is executing scripts to monitor the linux services which executes linux "ps" command and got stuck and putting the more load on the machine and apparently machine becomes unresponsive. We need to reboot the machine to bring it to normal state, please suggest how we can mitigate this issue.

Regards,
Mohan

Posted: **Tue Oct 04, 2016 9:28 pm**

You may need to look at using a different monitoring method. SNMP might be an option.

https://assets.nagios.com/downloads/nag ... g_SNMP.pdf

The Linux SNMP wizard should already exist in XI.

Another option is to configure your hadoop processes with a lower priority so that other things like NRPE are able to function correctly.

Posted: **Wed Oct 05, 2016 2:40 pm**

So through SNMP monitoring does nagios will not use "ps" command to check the services.

Posted: **Wed Oct 05, 2016 3:05 pm**

No, SNMP-based checking will not run ps and parse the output. However, it is entirely possible that the SNMP daemon itself on the remote machine uses ps internally, but it is not possible for us to tell whether this is the case.

Please give SNMP a shot and let us know if the load issue is still present. Thanks!

Posted: **Mon Oct 10, 2016 9:04 pm**

Why nrpe agent runs ps command and what it is doing with that output.

Posted: **Mon Oct 10, 2016 9:22 pm**

NRPE stands for "Nagios Remote Plugin Executor".

It allows you to execute plugins to check "stuff". The plugin does whatever it's supposed to and then returns the output and exit code back to NRPE and NRPE sends that back to Nagios.

Whatever plugin you are using to monitor the services uses the ps command.

You will need to show us your service definition for the plugin that is causing your issue. Go into CCM, find the service, click the disk icon and paste the text here.

Posted: **Tue Oct 11, 2016 2:50 am**

PFA service configuration file of the machine swodc01hdfs05 where we are seeing high load issues frequently.

Posted: **Tue Oct 11, 2016 1:16 pm**

The challenge what we are facing here issue during that time nrpe agent is executing scripts to monitor the linux services which executes linux "ps" command and got stuck and putting the more load on the machine and apparently machine becomes unresponsive. We need to reboot the machine to bring it to normal state, please suggest how we can mitigate this issue.

I can see the following commands, referenced in your config - check_disk, check_cpu_stats, check_load, check_mem, check_init_service, check_open_files, check_procs, and check_users. Can you show us how they are defined on the client (remote machine)?

You will find their definitions in either "/usr/local/nagios/etc/nrpe/common.cfg" or "/usr/local/nagios/etc/nrpe.cfg" file.

Posted: **Tue Oct 11, 2016 9:26 pm**

PFA requested file.

Posted: **Wed Oct 12, 2016 12:35 pm**

How often does this wedge occur? It seems unlikely ps would be the culprit as it simply reads and prints data from the kernel, however ps could have run into some process in a uninterruptible sleep state and not exited. Usually this is from disk IO eg NFS or something where an fsync can't complete properly.

What do the system logs look like after a reboot? Can you disable the NRPE checks and see if the hang still occurs? If not all at once at least bisecting the metrics would narrow it down.

Can you run top -bcn1 during such a high load? Also what happens if you send a hung ps process a SIGUSR1?

Nagios Support Forum

High Load Issue In Hadoop Ubuntu Machines.

High Load Issue In Hadoop Ubuntu Machines.

Re: High Load Issue In Hadoop Ubuntu Machines.

Re: High Load Issue In Hadoop Ubuntu Machines.

Re: High Load Issue In Hadoop Ubuntu Machines.

Re: High Load Issue In Hadoop Ubuntu Machines.

Re: High Load Issue In Hadoop Ubuntu Machines.

Re: High Load Issue In Hadoop Ubuntu Machines.

Re: High Load Issue In Hadoop Ubuntu Machines.

Re: High Load Issue In Hadoop Ubuntu Machines.

Re: High Load Issue In Hadoop Ubuntu Machines.