service check schedulling issue
Posted: Wed Mar 06, 2013 3:00 pm
Hi,
I nanage a Nagios Core distributed infrastructure consisting of a central node which only gets passive checks, sends notifications and deals with the NDOUtils data persistence tool.
All my servers so far (central and distributed) are using Ubuntu 10.04 LTS and nagios core 3.3.1)
On the latest distributed instance to be deployed, I took a big leap:
I began using a virtualized ubuntu server 12.04 (hosted on a Windows Server R2 with VMware Player) and Nagios Core 3.4.1. and I've also deployed this distributed instance with MySQL server and NDOUtils persistence engine.
The fact that so much has changed compared to my usual environment is making this diagnosis hard.
The issue is that despite I'm specifying the normal_check_interval individually for each service check (ranging between 5min and 60min), some services keep unchecked for long periods, displaying old timevalues in the 'Last Check Time' entrance of the web interface.
The VM does not indicate signs of CPU starvation nor memory shortages and about 1/4 of the aproximate 700 service checks for this instance keep with good and fresh 'Last Check Time' values.
I've read in some other forums/topics this issue could be related to
- NDOUtils :: Tried disabling it with no success
- HW clock on server - I don't believe a @hourly cronjob to NTP synch would cause it
I also tried to restart the nagios service with the use_retained_scheduling_info set to zero - no success.
I'm losing hair and head over this. So it may help, below there are schedulling directives from my nagios.cfg on this missbehaved remote monitor:
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
use_retained_scheduling_info=1
If I don't get a solution I will go back to my Ubuntu 10.04 with nagios 3.3.1 hosted on a Windows 7 with VMware Player (seems stupid, but it works like a charm)
Thank you for your time, patience and hopefully... knowledge
I nanage a Nagios Core distributed infrastructure consisting of a central node which only gets passive checks, sends notifications and deals with the NDOUtils data persistence tool.
All my servers so far (central and distributed) are using Ubuntu 10.04 LTS and nagios core 3.3.1)
On the latest distributed instance to be deployed, I took a big leap:
I began using a virtualized ubuntu server 12.04 (hosted on a Windows Server R2 with VMware Player) and Nagios Core 3.4.1. and I've also deployed this distributed instance with MySQL server and NDOUtils persistence engine.
The fact that so much has changed compared to my usual environment is making this diagnosis hard.
The issue is that despite I'm specifying the normal_check_interval individually for each service check (ranging between 5min and 60min), some services keep unchecked for long periods, displaying old timevalues in the 'Last Check Time' entrance of the web interface.
The VM does not indicate signs of CPU starvation nor memory shortages and about 1/4 of the aproximate 700 service checks for this instance keep with good and fresh 'Last Check Time' values.
I've read in some other forums/topics this issue could be related to
- NDOUtils :: Tried disabling it with no success
- HW clock on server - I don't believe a @hourly cronjob to NTP synch would cause it
I also tried to restart the nagios service with the use_retained_scheduling_info set to zero - no success.
I'm losing hair and head over this. So it may help, below there are schedulling directives from my nagios.cfg on this missbehaved remote monitor:
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
use_retained_scheduling_info=1
If I don't get a solution I will go back to my Ubuntu 10.04 with nagios 3.3.1 hosted on a Windows 7 with VMware Player (seems stupid, but it works like a charm)
Thank you for your time, patience and hopefully... knowledge