Page 1 of 2

Scheduled events over time piling up on "NOW"

Posted: Thu Mar 13, 2014 9:05 am
by johndoe
So it seems my scheduled events over time are somehow piling up on the "NOW", we use 99% passive checks hence the high amount of them. Check the attachment.

How can i make this better? we get results from machines at 1 minute intervals mostly

Re: Scheduled events over time piling up on "NOW"

Posted: Thu Mar 13, 2014 10:13 am
by tmcdonald
We'll need a bit more context than that.

How many CPU threads do you have total?
How much memory is installed?
Is ndoutils running?

Also, are you sure about that 99%? Because if that is right, then only 1% of the checks are active and it looks like you have 600 or so active checks. And if 600 is 1%, then 59,400 would be the other 99%. That's a lot of passive checks.

Re: Scheduled events over time piling up on "NOW"

Posted: Fri Mar 14, 2014 7:54 am
by johndoe
Hi Tmcdonald,

I have 8 cores on this machine on a KVM virtualized VM, machine has 15gb of ram and is using ramdisk and i believe all optimizations mentioned on all nagios documents i could find...
Actually the only active checks we do are to check two websites (via ping) and that is once a minute or so.
We are currently monitoring 731 services on 37 hosts, these are mostly checked at 1 minute intervals...
Ndoutils (ndo2db) is running and on nagios status page all is green...

What other info can i provide you with?

Note: Sometime ago, perhaps half a year ago, i do remember things popping up on the logs saying that active checks would be scheduled, something along the lines of "service hasn-t been checked for a while, scheduling active check now" or something similar, i can-t seem to find these anymore on the logs when i searched for them now.. maybe unrelated to this issue but thought i would mention.

Re: Scheduled events over time piling up on "NOW"

Posted: Fri Mar 14, 2014 8:01 am
by scottwilkerson
Is your DB offloaded to a different server? If so, are the times on the servers synced?

Re: Scheduled events over time piling up on "NOW"

Posted: Fri Mar 14, 2014 8:05 am
by johndoe
No, all happens on this server

as for time...

Code: Select all

Date/Time

PHP Timezone: UTC 
PHP Time: Fri, 14 Mar 2014 13:06:07 +0000
System Time: Fri, 14 Mar 2014 13:06:07 +0000

Re: Scheduled events over time piling up on "NOW"

Posted: Fri Mar 14, 2014 11:08 am
by slansing
System and PHP times look good. Also what is the output of the following:

Code: Select all

ls /usr/local/nagios/var/spool/checkresults/ | wc -l
And could you run this quick test and let us know how the event queue looks afterwards?:

Code: Select all

service nagios stop

service ndo2db stop

service nagios start
Then wait about 10 seconds, and run:

Code: Select all

service ndo2db start

Re: Scheduled events over time piling up on "NOW"

Posted: Mon Mar 17, 2014 8:26 am
by johndoe
Actually I have placed that folder(checkresults) on the ramdisk as directed by one of the performance improving tutorials, so count is as follows:

Code: Select all

[root@XX checkresults]# ls -lha | wc -l
203
[root@XX checkresults]# pwd
/var/nagiosramdisk/spool/checkresults
[root@XX checkresults]# ls -lha | wc -l
93
Note: the count on the actual folder you requested is 0 since they are moved to the ramdisk as previously mentioned

Strangely enough the value on the graph still shows the same on NOW (roughly above the 576 line), strange that it doesnt osccilate and just stays still...?

As for the starting and stopping sequence you mentioned, the results are as follows:

Code: Select all

[root@XX checkresults]# service nagios stop && service ndo2db stop && service nagios start
Stopping nagios: .done.
Stopping ndo2db: head: cannot open `/usr/local/nagios/var/ndo2db.lock' for reading: No such file or directory
done.
Starting nagios: done.
You have mail in /var/spool/mail/root
[root@XX checkresults]# service ndo2db stop
ndo2db was not running... could not stop
[root@XX checkresults]# service ndo2db start^C
[root@XX checkresults]# service nagios start
Starting nagios: done.
[root@XX checkresults]# service ndo2db start
Starting ndo2db: done.
[root@XX checkresults]# ls -lha | wc -l
173
You have mail in /var/spool/mail/root
[root@XX checkresults]# ls -lha | wc -l
15
[root@XX checkresults]# ls -lha | wc -l
19
After this, value on the graph was still on the same values

Re: Scheduled events over time piling up on "NOW"

Posted: Mon Mar 17, 2014 1:29 pm
by slansing
So it looks like you have mostly passive checks, for your active checks (I'd assume host ping checks, etc) are they actually being scheduled correctly? Are the times being displayed as normal, and checks constantly occurring, and being scheduled at their specified intervals? Can you run through the aforementioned restart procedure for NDO and nagios, and then post the output of the following?:

Code: Select all

tail -100 /usr/local/nagios/var/nagios.log

Re: Scheduled events over time piling up on "NOW"

Posted: Tue Mar 18, 2014 7:52 am
by johndoe
Slansing,

I have previously mentioned a problem which i think might be affecting this http://support.nagios.com/forum/viewtop ... =6&t=25790

I see alot of those entries for hosts that are actually transmitting passive checks but that i do not yet want to configure at this stage. Can this be what is causing the high number on NOW ?

The log kind of becomes useless since it is hard to find anything on it other than those lines

Re: Scheduled events over time piling up on "NOW"

Posted: Tue Mar 18, 2014 4:59 pm
by lmiltchev
Run the following commands and show the output:

Code: Select all

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg | head -2
tail -50 /var/log/messages | grep ndo
service nagios status
service ndo2db status
Also, upload the "nagios.cfg" file, so that we can review it.