The Nagios server experience abnorma load suddenly and back to normal in 3 minutes. The logs shows following messages...
Jun 19 17:33:04 reg-nagios nagios: Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1403220778.perfdata.host' timed out after 5 seconds
Jun 19 17:33:10 reg-nagios nagios: Warning: Service performance data file processing command '/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1403220784.perfdata.service' timed out after 5 seconds
Jun 19 17:33:20 reg-nagios nagios: Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1403220794.perfdata.host' timed out after 5 seconds
Jun 19 17:33:26 reg-nagios nagios: Warning: Service performance data file processing command '/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1403220800.perfdata.service' timed out after 5 seconds
Jun 19 17:33:26 reg-nagios nagios: wproc: Core Worker 31012: job 6508 (pid=13539) timed out. Killing it
Jun 19 17:33:26 reg-nagios nagios: wproc: CHECK job 6508 from worker Core Worker 31012 timed out after 30.02s
Jun 19 17:33:26 reg-nagios nagios: wproc: host=reg-glb18.viterra.com; service=(null);
Jun 19 17:33:26 reg-nagios nagios: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Jun 19 17:33:26 reg-nagios nagios: Warning: Check of host 'reg-glb18.viterra.com' timed out after 30.02 seconds
Jun 19 17:33:26 reg-nagios nagios: HOST ALERT: reg-glb18.viterra.com;DOWN;SOFT;1;(Host check timed out after 30.02 seconds)
Jun 19 17:33:26 reg-nagios nagios: wproc: Core Worker 31012: tv.tv_sec is currently 1403220804
Jun 19 17:33:26 reg-nagios nagios: wproc: Core Worker 31012: Failed to reap child with pid 13539. Next attempt @ 1403220809.534442
Jun 19 17:33:27 reg-nagios nagios: wproc: Core Worker 31039: job 6508 (pid=13564) timed out. Killing it
Jun 19 17:33:27 reg-nagios nagios: wproc: CHECK job 6508 from worker Core Worker 31039 timed out after 30.01s
Jun 19 17:33:27 reg-nagios nagios: wproc: host=reg-dut-01.viterra.com; service=(null);
Jun 19 17:33:27 reg-nagios nagios: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
But nothing looks wrong at the disk too. Wondering what might be the reason?
Also the monitoring engine queue shows 1000+ events at same time. Before it was evenly distributed.
Abnormal load on Nagios server
-
narayanamoorthys
- Posts: 10
- Joined: Tue Dec 17, 2013 2:40 am
Abnormal load on Nagios server
You do not have the required permissions to view the files attached to this post.
Re: Abnormal load on Nagios server
Did this happen only once or it keeps happening over a regular intervals? What is the Nagios XI version that you are currently using? Have you made any changes to the system prior to experiencing this issue? Are you using Mod Gearman?The Nagios server experience abnorma load suddenly and back to normal in 3 minutes.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
narayanamoorthys
- Posts: 10
- Joined: Tue Dec 17, 2013 2:40 am
Re: Abnormal load on Nagios server
It happened only once so far and no changes made.
Version: 2024R1.0
We are not using Mod Gearman
Version: 2024R1.0
We are not using Mod Gearman
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Abnormal load on Nagios server
Are you running additional modules?:
Code: Select all
cat /usr/local/nagios/etc/nagios.cfg | grep 'broker'Are you saying the above only happened one time? Did it resolve itself?Also the monitoring engine queue shows 1000+ events at same time. Before it was evenly distributed.
-
narayanamoorthys
- Posts: 10
- Joined: Tue Dec 17, 2013 2:40 am
Re: Abnormal load on Nagios server
Below find the output
[root@reg-nagios libexec]# cat /usr/local/nagios/etc/nagios.cfg | grep 'broker'
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
event_broker_options=-1
Now monitoring queue shows 420+ checks at a point with few distributed checks.
[root@reg-nagios libexec]# cat /usr/local/nagios/etc/nagios.cfg | grep 'broker'
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
event_broker_options=-1
Now monitoring queue shows 420+ checks at a point with few distributed checks.
Re: Abnormal load on Nagios server
Are you running a lot of ESX, WMI, or check_ifoperstatus checks? Those can be quite CPU and memory intensive and having many run at once (or get stuck) can cause this behavior. How many hosts/services are you checking overall? Are they all on a 5-minute timer or are there some that run more often?
Former Nagios employee
-
narayanamoorthys
- Posts: 10
- Joined: Tue Dec 17, 2013 2:40 am
Re: Abnormal load on Nagios server
We don't have any ESX checks. We monitor around 150 Unix servers and they are at default check intervals (5 min)
Re: Abnormal load on Nagios server
Can I get a copy of your profile? In the XI web interface, go to Admin -> System Profile and click the blue "Download Profile" button. Then PM that profile.zip file to me.
Former Nagios employee