Scheduled events over time piling up on "NOW"

johndoe · Post by **johndoe** » Thu Mar 13, 2014 9:05 am

So it seems my scheduled events over time are somehow piling up on the "NOW", we use 99% passive checks hence the high amount of them. Check the attachment.

How can i make this better? we get results from machines at 1 minute intervals mostly

tmcdonald · Post by **tmcdonald** » Thu Mar 13, 2014 10:13 am

We'll need a bit more context than that.

How many CPU threads do you have total?
How much memory is installed?
Is ndoutils running?

Also, are you sure about that 99%? Because if that is right, then only 1% of the checks are active and it looks like you have 600 or so active checks. And if 600 is 1%, then 59,400 would be the other 99%. That's a lot of passive checks.

johndoe · Post by **johndoe** » Fri Mar 14, 2014 7:54 am

Hi Tmcdonald,

I have 8 cores on this machine on a KVM virtualized VM, machine has 15gb of ram and is using ramdisk and i believe all optimizations mentioned on all nagios documents i could find...
Actually the only active checks we do are to check two websites (via ping) and that is once a minute or so.
We are currently monitoring 731 services on 37 hosts, these are mostly checked at 1 minute intervals...
Ndoutils (ndo2db) is running and on nagios status page all is green...

What other info can i provide you with?

Note: Sometime ago, perhaps half a year ago, i do remember things popping up on the logs saying that active checks would be scheduled, something along the lines of "service hasn-t been checked for a while, scheduling active check now" or something similar, i can-t seem to find these anymore on the logs when i searched for them now.. maybe unrelated to this issue but thought i would mention.

scottwilkerson · Post by **scottwilkerson** » Fri Mar 14, 2014 8:01 am

Is your DB offloaded to a different server? If so, are the times on the servers synced?

johndoe · Post by **johndoe** » Fri Mar 14, 2014 8:05 am

No, all happens on this server

as for time...

Code: Select all

Date/Time

PHP Timezone: UTC 
PHP Time: Fri, 14 Mar 2014 13:06:07 +0000
System Time: Fri, 14 Mar 2014 13:06:07 +0000

slansing · Post by **slansing** » Fri Mar 14, 2014 11:08 am

System and PHP times look good. Also what is the output of the following:

Code: Select all

ls /usr/local/nagios/var/spool/checkresults/ | wc -l

And could you run this quick test and let us know how the event queue looks afterwards?:

Code: Select all

service nagios stop

service ndo2db stop

service nagios start

Then wait about 10 seconds, and run:

Code: Select all

service ndo2db start

johndoe · Post by **johndoe** » Mon Mar 17, 2014 8:26 am

Actually I have placed that folder(checkresults) on the ramdisk as directed by one of the performance improving tutorials, so count is as follows:

Code: Select all

[root@XX checkresults]# ls -lha | wc -l
203
[root@XX checkresults]# pwd
/var/nagiosramdisk/spool/checkresults
[root@XX checkresults]# ls -lha | wc -l
93

Note: the count on the actual folder you requested is 0 since they are moved to the ramdisk as previously mentioned

Strangely enough the value on the graph still shows the same on NOW (roughly above the 576 line), strange that it doesnt osccilate and just stays still...?

As for the starting and stopping sequence you mentioned, the results are as follows:

Code: Select all

[root@XX checkresults]# service nagios stop && service ndo2db stop && service nagios start
Stopping nagios: .done.
Stopping ndo2db: head: cannot open `/usr/local/nagios/var/ndo2db.lock' for reading: No such file or directory
done.
Starting nagios: done.
You have mail in /var/spool/mail/root
[root@XX checkresults]# service ndo2db stop
ndo2db was not running... could not stop
[root@XX checkresults]# service ndo2db start^C
[root@XX checkresults]# service nagios start
Starting nagios: done.
[root@XX checkresults]# service ndo2db start
Starting ndo2db: done.
[root@XX checkresults]# ls -lha | wc -l
173
You have mail in /var/spool/mail/root
[root@XX checkresults]# ls -lha | wc -l
15
[root@XX checkresults]# ls -lha | wc -l
19

After this, value on the graph was still on the same values

slansing · Post by **slansing** » Mon Mar 17, 2014 1:29 pm

So it looks like you have mostly passive checks, for your active checks (I'd assume host ping checks, etc) are they actually being scheduled correctly? Are the times being displayed as normal, and checks constantly occurring, and being scheduled at their specified intervals? Can you run through the aforementioned restart procedure for NDO and nagios, and then post the output of the following?:

Code: Select all

tail -100 /usr/local/nagios/var/nagios.log

johndoe · Post by **johndoe** » Tue Mar 18, 2014 7:52 am

Slansing,

I have previously mentioned a problem which i think might be affecting this http://support.nagios.com/forum/viewtop ... =6&t=25790

I see alot of those entries for hosts that are actually transmitting passive checks but that i do not yet want to configure at this stage. Can this be what is causing the high number on NOW ?

The log kind of becomes useless since it is hard to find anything on it other than those lines

Post by **lmiltchev** » Tue Mar 18, 2014 4:59 pm

Run the following commands and show the output:

Code: Select all

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg | head -2
tail -50 /var/log/messages | grep ndo
service nagios status
service ndo2db status

Also, upload the "nagios.cfg" file, so that we can review it.

Nagios Support Forum

Scheduled events over time piling up on "NOW"

Scheduled events over time piling up on "NOW"

Re: Scheduled events over time piling up on "NOW"

Re: Scheduled events over time piling up on "NOW"

Re: Scheduled events over time piling up on "NOW"

Re: Scheduled events over time piling up on "NOW"

Re: Scheduled events over time piling up on "NOW"

Re: Scheduled events over time piling up on "NOW"

Re: Scheduled events over time piling up on "NOW"

Re: Scheduled events over time piling up on "NOW"

Re: Scheduled events over time piling up on "NOW"