service check schedulling issue
- sebastiaopburnay
- Posts: 105
- Joined: Sun Oct 31, 2010 1:40 pm
- Location: Lisbon, Portugal
service check schedulling issue
Hi,
I nanage a Nagios Core distributed infrastructure consisting of a central node which only gets passive checks, sends notifications and deals with the NDOUtils data persistence tool.
All my servers so far (central and distributed) are using Ubuntu 10.04 LTS and nagios core 3.3.1)
On the latest distributed instance to be deployed, I took a big leap:
I began using a virtualized ubuntu server 12.04 (hosted on a Windows Server R2 with VMware Player) and Nagios Core 3.4.1. and I've also deployed this distributed instance with MySQL server and NDOUtils persistence engine.
The fact that so much has changed compared to my usual environment is making this diagnosis hard.
The issue is that despite I'm specifying the normal_check_interval individually for each service check (ranging between 5min and 60min), some services keep unchecked for long periods, displaying old timevalues in the 'Last Check Time' entrance of the web interface.
The VM does not indicate signs of CPU starvation nor memory shortages and about 1/4 of the aproximate 700 service checks for this instance keep with good and fresh 'Last Check Time' values.
I've read in some other forums/topics this issue could be related to
- NDOUtils :: Tried disabling it with no success
- HW clock on server - I don't believe a @hourly cronjob to NTP synch would cause it
I also tried to restart the nagios service with the use_retained_scheduling_info set to zero - no success.
I'm losing hair and head over this. So it may help, below there are schedulling directives from my nagios.cfg on this missbehaved remote monitor:
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
use_retained_scheduling_info=1
If I don't get a solution I will go back to my Ubuntu 10.04 with nagios 3.3.1 hosted on a Windows 7 with VMware Player (seems stupid, but it works like a charm)
Thank you for your time, patience and hopefully... knowledge
I nanage a Nagios Core distributed infrastructure consisting of a central node which only gets passive checks, sends notifications and deals with the NDOUtils data persistence tool.
All my servers so far (central and distributed) are using Ubuntu 10.04 LTS and nagios core 3.3.1)
On the latest distributed instance to be deployed, I took a big leap:
I began using a virtualized ubuntu server 12.04 (hosted on a Windows Server R2 with VMware Player) and Nagios Core 3.4.1. and I've also deployed this distributed instance with MySQL server and NDOUtils persistence engine.
The fact that so much has changed compared to my usual environment is making this diagnosis hard.
The issue is that despite I'm specifying the normal_check_interval individually for each service check (ranging between 5min and 60min), some services keep unchecked for long periods, displaying old timevalues in the 'Last Check Time' entrance of the web interface.
The VM does not indicate signs of CPU starvation nor memory shortages and about 1/4 of the aproximate 700 service checks for this instance keep with good and fresh 'Last Check Time' values.
I've read in some other forums/topics this issue could be related to
- NDOUtils :: Tried disabling it with no success
- HW clock on server - I don't believe a @hourly cronjob to NTP synch would cause it
I also tried to restart the nagios service with the use_retained_scheduling_info set to zero - no success.
I'm losing hair and head over this. So it may help, below there are schedulling directives from my nagios.cfg on this missbehaved remote monitor:
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
use_retained_scheduling_info=1
If I don't get a solution I will go back to my Ubuntu 10.04 with nagios 3.3.1 hosted on a Windows 7 with VMware Player (seems stupid, but it works like a charm)
Thank you for your time, patience and hopefully... knowledge
Re: service check schedulling issue
What method are you using for the passive checks? If it is nsca, what version is running on the core server and on the client systems?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
- sebastiaopburnay
- Posts: 105
- Joined: Sun Oct 31, 2010 1:40 pm
- Location: Lisbon, Portugal
Re: service check schedulling issue
I'm using NSCA daemon/server Version: 2.7.2 on the central server.
The remote instance uses a submit_service/host shell script designed to call send_nsca whenever the check is HARD (send_nsca client version is also 2.7.2.)
But the problem here aren't the passive checks, those are being smoothly sent from the remote towards the central server.
My problem are the active checks, not being executed on time by the remote server. They simply do not respect the normal_check_interval values and sometimes get more than 24h/48h late.
The remote instance uses a submit_service/host shell script designed to call send_nsca whenever the check is HARD (send_nsca client version is also 2.7.2.)
But the problem here aren't the passive checks, those are being smoothly sent from the remote towards the central server.
My problem are the active checks, not being executed on time by the remote server. They simply do not respect the normal_check_interval values and sometimes get more than 24h/48h late.
Re: service check schedulling issue
Curious. So the remote shell script is not firing off anywhere near the right time? Are the delays consistent?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
- sebastiaopburnay
- Posts: 105
- Joined: Sun Oct 31, 2010 1:40 pm
- Location: Lisbon, Portugal
Re: service check schedulling issue
That's correct... unless it is being executed and somehow not processed/taken to account by nagios.abrist wrote: So the remote shell script is not firing off anywhere near the right time?
Those scrips (mostly check_nrpe and check_nt plugins) return positive well formed results in the CLI.
I'm not sure What you mean by that, but I'm attatching to this reply a view of the scheduling queue, showing how Nagios' process is scheduling 'Next Check Times' to points in time prior to actual time (which is incredibly nonesense).abrist wrote: Are the delays consistent?
Another «fun»-fact is that host checks are being correctly scheduled and executed.
The last time I restarted Nagios I forced it not to retain status (forcing the process to check all hosts/services). As result, only 148 of all 704 service checks have been executed, all other 556 appear as «PENDING»
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: service check schedulling issue
Out of curiosity, what are the following set to?
does the machine have the correct timezone?
Code: Select all
check_for_orphaned_services
check_for_orphaned_hosts
use_timezone- sebastiaopburnay
- Posts: 105
- Joined: Sun Oct 31, 2010 1:40 pm
- Location: Lisbon, Portugal
Re: service check schedulling issue
I have the following configsscottwilkerson wrote:Out of curiosity, what are the following set to?Code: Select all
check_for_orphaned_services check_for_orphaned_hosts use_timezone
Code: Select all
check_for_orphaned_services=1
check_for_orphaned_hosts=1
#use_timezone=US/Mountain ; comented
#use_timezone=Australia/Brisbane ; comented
The remote network has an NTP server and I use an @hourly cron job for the VM's clock to synch every hour.scottwilkerson wrote: does the machine have the correct timezone?
I'm using Lisbon/London/Grenwitch hour
Re: service check schedulling issue
What are your timezone settings in /etc/php.ini? Have you configured /etc/localtime for the desired timezone correctly?
EDIT: Though this may not matter as host checks are running on schedule. This is very strange.
EDIT: Though this may not matter as host checks are running on schedule. This is very strange.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
- sebastiaopburnay
- Posts: 105
- Joined: Sun Oct 31, 2010 1:40 pm
- Location: Lisbon, Portugal
Re: service check schedulling issue
Well this is embarassing for a sysadmin, I have none of those configs:abrist wrote:What are your timezone settings in /etc/php.ini?.
Code: Select all
root@SERVER:/# cat /etc/php.ini
cat: /etc/php.ini: No such file or directory
I never tampered with /etc/localtime, maybe it has inherited values during OS install, when I run cat over that file all I got was binary rubish...abrist wrote: Have you configured /etc/localtime for the desired timezone correctly?
Indeed it is,abrist wrote:This is very strange.
Out of dispair I'm temporarly abandoning this 'Ubuntu 12.04 x64' with 'nagios 3.4.1' architecture for now and apply the good old 10.04 x86 with nagios 3.3.1
But if you are willing I'm up to a brainstorm/solution to this intriguing 'bug'/missconfiguration
Re: service check schedulling issue
I could be in one of these locations:
Or you could just "find" it:
You should be able to set your localtime in ubuntu with the utility "tzconfig": http://manpages.ubuntu.com/manpages/dap ... fig.8.html
Lets make sure these few things are setup correctly, then we will move into the really strange stuff.
So host checks run on time. Service checks are delayed by an inconsistent amount. Weird.
Code: Select all
/etc/php/php.ini
/etc/php5/php.ini
/usr/bin/php5/bin/php.iniCode: Select all
find / -name php.iniLets make sure these few things are setup correctly, then we will move into the really strange stuff.
So host checks run on time. Service checks are delayed by an inconsistent amount. Weird.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.