service check schedulling issue

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
User avatar
sebastiaopburnay
Posts: 105
Joined: Sun Oct 31, 2010 1:40 pm
Location: Lisbon, Portugal

service check schedulling issue

Post by sebastiaopburnay »

Hi,

I nanage a Nagios Core distributed infrastructure consisting of a central node which only gets passive checks, sends notifications and deals with the NDOUtils data persistence tool.

All my servers so far (central and distributed) are using Ubuntu 10.04 LTS and nagios core 3.3.1)

On the latest distributed instance to be deployed, I took a big leap:

I began using a virtualized ubuntu server 12.04 (hosted on a Windows Server R2 with VMware Player) and Nagios Core 3.4.1. and I've also deployed this distributed instance with MySQL server and NDOUtils persistence engine.

The fact that so much has changed compared to my usual environment is making this diagnosis hard.

The issue is that despite I'm specifying the normal_check_interval individually for each service check (ranging between 5min and 60min), some services keep unchecked for long periods, displaying old timevalues in the 'Last Check Time' entrance of the web interface.

The VM does not indicate signs of CPU starvation nor memory shortages and about 1/4 of the aproximate 700 service checks for this instance keep with good and fresh 'Last Check Time' values.

I've read in some other forums/topics this issue could be related to
- NDOUtils :: Tried disabling it with no success
- HW clock on server - I don't believe a @hourly cronjob to NTP synch would cause it

I also tried to restart the nagios service with the use_retained_scheduling_info set to zero - no success.

I'm losing hair and head over this. So it may help, below there are schedulling directives from my nagios.cfg on this missbehaved remote monitor:
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
use_retained_scheduling_info=1

If I don't get a solution I will go back to my Ubuntu 10.04 with nagios 3.3.1 hosted on a Windows 7 with VMware Player (seems stupid, but it works like a charm)

Thank you for your time, patience and hopefully... knowledge
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: service check schedulling issue

Post by abrist »

What method are you using for the passive checks? If it is nsca, what version is running on the core server and on the client systems?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
sebastiaopburnay
Posts: 105
Joined: Sun Oct 31, 2010 1:40 pm
Location: Lisbon, Portugal

Re: service check schedulling issue

Post by sebastiaopburnay »

I'm using NSCA daemon/server Version: 2.7.2 on the central server.

The remote instance uses a submit_service/host shell script designed to call send_nsca whenever the check is HARD (send_nsca client version is also 2.7.2.)

But the problem here aren't the passive checks, those are being smoothly sent from the remote towards the central server.

My problem are the active checks, not being executed on time by the remote server. They simply do not respect the normal_check_interval values and sometimes get more than 24h/48h late.
Attachments
Visio.jpg
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: service check schedulling issue

Post by abrist »

Curious. So the remote shell script is not firing off anywhere near the right time? Are the delays consistent?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
sebastiaopburnay
Posts: 105
Joined: Sun Oct 31, 2010 1:40 pm
Location: Lisbon, Portugal

Re: service check schedulling issue

Post by sebastiaopburnay »

abrist wrote: So the remote shell script is not firing off anywhere near the right time?
That's correct... unless it is being executed and somehow not processed/taken to account by nagios.

Those scrips (mostly check_nrpe and check_nt plugins) return positive well formed results in the CLI.
abrist wrote: Are the delays consistent?
I'm not sure What you mean by that, but I'm attatching to this reply a view of the scheduling queue, showing how Nagios' process is scheduling 'Next Check Times' to points in time prior to actual time (which is incredibly nonesense).

Another «fun»-fact is that host checks are being correctly scheduled and executed.

The last time I restarted Nagios I forced it not to retain status (forcing the process to check all hosts/services). As result, only 148 of all 704 service checks have been executed, all other 556 appear as «PENDING»
Attachments
Stupid_Scheduling.PNG
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: service check schedulling issue

Post by scottwilkerson »

Out of curiosity, what are the following set to?

Code: Select all

check_for_orphaned_services
check_for_orphaned_hosts
use_timezone
does the machine have the correct timezone?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
User avatar
sebastiaopburnay
Posts: 105
Joined: Sun Oct 31, 2010 1:40 pm
Location: Lisbon, Portugal

Re: service check schedulling issue

Post by sebastiaopburnay »

scottwilkerson wrote:Out of curiosity, what are the following set to?

Code: Select all

check_for_orphaned_services
check_for_orphaned_hosts
use_timezone
I have the following configs

Code: Select all

check_for_orphaned_services=1
check_for_orphaned_hosts=1
#use_timezone=US/Mountain             ; comented
#use_timezone=Australia/Brisbane    ;  comented
scottwilkerson wrote: does the machine have the correct timezone?
The remote network has an NTP server and I use an @hourly cron job for the VM's clock to synch every hour.
I'm using Lisbon/London/Grenwitch hour
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: service check schedulling issue

Post by abrist »

What are your timezone settings in /etc/php.ini? Have you configured /etc/localtime for the desired timezone correctly?
EDIT: Though this may not matter as host checks are running on schedule. This is very strange.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
sebastiaopburnay
Posts: 105
Joined: Sun Oct 31, 2010 1:40 pm
Location: Lisbon, Portugal

Re: service check schedulling issue

Post by sebastiaopburnay »

abrist wrote:What are your timezone settings in /etc/php.ini?.
Well this is embarassing for a sysadmin, I have none of those configs:

Code: Select all

root@SERVER:/# cat /etc/php.ini
cat: /etc/php.ini: No such file or directory
abrist wrote: Have you configured /etc/localtime for the desired timezone correctly?
I never tampered with /etc/localtime, maybe it has inherited values during OS install, when I run cat over that file all I got was binary rubish...
abrist wrote:This is very strange.
Indeed it is,

Out of dispair I'm temporarly abandoning this 'Ubuntu 12.04 x64' with 'nagios 3.4.1' architecture for now and apply the good old 10.04 x86 with nagios 3.3.1

But if you are willing I'm up to a brainstorm/solution to this intriguing 'bug'/missconfiguration
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: service check schedulling issue

Post by abrist »

I could be in one of these locations:

Code: Select all

/etc/php/php.ini
/etc/php5/php.ini
/usr/bin/php5/bin/php.ini
Or you could just "find" it:

Code: Select all

find / -name php.ini
You should be able to set your localtime in ubuntu with the utility "tzconfig": http://manpages.ubuntu.com/manpages/dap ... fig.8.html
Lets make sure these few things are setup correctly, then we will move into the really strange stuff.
So host checks run on time. Service checks are delayed by an inconsistent amount. Weird.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked