Abnormal load on Nagios server

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
narayanamoorthys
Posts: 10
Joined: Tue Dec 17, 2013 2:40 am

Abnormal load on Nagios server

Post by narayanamoorthys »

The Nagios server experience abnorma load suddenly and back to normal in 3 minutes. The logs shows following messages...

Jun 19 17:33:04 reg-nagios nagios: Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1403220778.perfdata.host' timed out after 5 seconds
Jun 19 17:33:10 reg-nagios nagios: Warning: Service performance data file processing command '/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1403220784.perfdata.service' timed out after 5 seconds
Jun 19 17:33:20 reg-nagios nagios: Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1403220794.perfdata.host' timed out after 5 seconds
Jun 19 17:33:26 reg-nagios nagios: Warning: Service performance data file processing command '/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1403220800.perfdata.service' timed out after 5 seconds
Jun 19 17:33:26 reg-nagios nagios: wproc: Core Worker 31012: job 6508 (pid=13539) timed out. Killing it
Jun 19 17:33:26 reg-nagios nagios: wproc: CHECK job 6508 from worker Core Worker 31012 timed out after 30.02s
Jun 19 17:33:26 reg-nagios nagios: wproc: host=reg-glb18.viterra.com; service=(null);
Jun 19 17:33:26 reg-nagios nagios: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Jun 19 17:33:26 reg-nagios nagios: Warning: Check of host 'reg-glb18.viterra.com' timed out after 30.02 seconds
Jun 19 17:33:26 reg-nagios nagios: HOST ALERT: reg-glb18.viterra.com;DOWN;SOFT;1;(Host check timed out after 30.02 seconds)
Jun 19 17:33:26 reg-nagios nagios: wproc: Core Worker 31012: tv.tv_sec is currently 1403220804
Jun 19 17:33:26 reg-nagios nagios: wproc: Core Worker 31012: Failed to reap child with pid 13539. Next attempt @ 1403220809.534442
Jun 19 17:33:27 reg-nagios nagios: wproc: Core Worker 31039: job 6508 (pid=13564) timed out. Killing it
Jun 19 17:33:27 reg-nagios nagios: wproc: CHECK job 6508 from worker Core Worker 31039 timed out after 30.01s
Jun 19 17:33:27 reg-nagios nagios: wproc: host=reg-dut-01.viterra.com; service=(null);
Jun 19 17:33:27 reg-nagios nagios: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;

But nothing looks wrong at the disk too. Wondering what might be the reason?

Also the monitoring engine queue shows 1000+ events at same time. Before it was evenly distributed.
You do not have the required permissions to view the files attached to this post.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Abnormal load on Nagios server

Post by lmiltchev »

The Nagios server experience abnorma load suddenly and back to normal in 3 minutes.
Did this happen only once or it keeps happening over a regular intervals? What is the Nagios XI version that you are currently using? Have you made any changes to the system prior to experiencing this issue? Are you using Mod Gearman?
Be sure to check out our Knowledgebase for helpful articles and solutions!
narayanamoorthys
Posts: 10
Joined: Tue Dec 17, 2013 2:40 am

Re: Abnormal load on Nagios server

Post by narayanamoorthys »

It happened only once so far and no changes made.

Version: 2024R1.0

We are not using Mod Gearman
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Abnormal load on Nagios server

Post by slansing »

Are you running additional modules?:

Code: Select all

cat /usr/local/nagios/etc/nagios.cfg | grep 'broker'
Also the monitoring engine queue shows 1000+ events at same time. Before it was evenly distributed.
Are you saying the above only happened one time? Did it resolve itself?
narayanamoorthys
Posts: 10
Joined: Tue Dec 17, 2013 2:40 am

Re: Abnormal load on Nagios server

Post by narayanamoorthys »

Below find the output

[root@reg-nagios libexec]# cat /usr/local/nagios/etc/nagios.cfg | grep 'broker'
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
event_broker_options=-1

Now monitoring queue shows 420+ checks at a point with few distributed checks.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Abnormal load on Nagios server

Post by tmcdonald »

Are you running a lot of ESX, WMI, or check_ifoperstatus checks? Those can be quite CPU and memory intensive and having many run at once (or get stuck) can cause this behavior. How many hosts/services are you checking overall? Are they all on a 5-minute timer or are there some that run more often?
Former Nagios employee
narayanamoorthys
Posts: 10
Joined: Tue Dec 17, 2013 2:40 am

Re: Abnormal load on Nagios server

Post by narayanamoorthys »

We don't have any ESX checks. We monitor around 150 Unix servers and they are at default check intervals (5 min)
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Abnormal load on Nagios server

Post by tmcdonald »

Can I get a copy of your profile? In the XI web interface, go to Admin -> System Profile and click the blue "Download Profile" button. Then PM that profile.zip file to me.
Former Nagios employee
Locked