Scheduled events over time piling up on "NOW"
Scheduled events over time piling up on "NOW"
So it seems my scheduled events over time are somehow piling up on the "NOW", we use 99% passive checks hence the high amount of them. Check the attachment.
How can i make this better? we get results from machines at 1 minute intervals mostly
How can i make this better? we get results from machines at 1 minute intervals mostly
You do not have the required permissions to view the files attached to this post.
Nagios XI 2012R2.8c Running on Ubuntu 12.04 Using 99% passive checks for monitoring
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
Re: Scheduled events over time piling up on "NOW"
We'll need a bit more context than that.
How many CPU threads do you have total?
How much memory is installed?
Is ndoutils running?
Also, are you sure about that 99%? Because if that is right, then only 1% of the checks are active and it looks like you have 600 or so active checks. And if 600 is 1%, then 59,400 would be the other 99%. That's a lot of passive checks.
How many CPU threads do you have total?
How much memory is installed?
Is ndoutils running?
Also, are you sure about that 99%? Because if that is right, then only 1% of the checks are active and it looks like you have 600 or so active checks. And if 600 is 1%, then 59,400 would be the other 99%. That's a lot of passive checks.
Former Nagios employee
Re: Scheduled events over time piling up on "NOW"
Hi Tmcdonald,
I have 8 cores on this machine on a KVM virtualized VM, machine has 15gb of ram and is using ramdisk and i believe all optimizations mentioned on all nagios documents i could find...
Actually the only active checks we do are to check two websites (via ping) and that is once a minute or so.
We are currently monitoring 731 services on 37 hosts, these are mostly checked at 1 minute intervals...
Ndoutils (ndo2db) is running and on nagios status page all is green...
What other info can i provide you with?
Note: Sometime ago, perhaps half a year ago, i do remember things popping up on the logs saying that active checks would be scheduled, something along the lines of "service hasn-t been checked for a while, scheduling active check now" or something similar, i can-t seem to find these anymore on the logs when i searched for them now.. maybe unrelated to this issue but thought i would mention.
I have 8 cores on this machine on a KVM virtualized VM, machine has 15gb of ram and is using ramdisk and i believe all optimizations mentioned on all nagios documents i could find...
Actually the only active checks we do are to check two websites (via ping) and that is once a minute or so.
We are currently monitoring 731 services on 37 hosts, these are mostly checked at 1 minute intervals...
Ndoutils (ndo2db) is running and on nagios status page all is green...
What other info can i provide you with?
Note: Sometime ago, perhaps half a year ago, i do remember things popping up on the logs saying that active checks would be scheduled, something along the lines of "service hasn-t been checked for a while, scheduling active check now" or something similar, i can-t seem to find these anymore on the logs when i searched for them now.. maybe unrelated to this issue but thought i would mention.
Nagios XI 2012R2.8c Running on Ubuntu 12.04 Using 99% passive checks for monitoring
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Scheduled events over time piling up on "NOW"
Is your DB offloaded to a different server? If so, are the times on the servers synced?
Re: Scheduled events over time piling up on "NOW"
No, all happens on this server
as for time...
as for time...
Code: Select all
Date/Time
PHP Timezone: UTC
PHP Time: Fri, 14 Mar 2014 13:06:07 +0000
System Time: Fri, 14 Mar 2014 13:06:07 +0000Nagios XI 2012R2.8c Running on Ubuntu 12.04 Using 99% passive checks for monitoring
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Scheduled events over time piling up on "NOW"
System and PHP times look good. Also what is the output of the following:
And could you run this quick test and let us know how the event queue looks afterwards?:
Then wait about 10 seconds, and run:
Code: Select all
ls /usr/local/nagios/var/spool/checkresults/ | wc -lCode: Select all
service nagios stop
service ndo2db stop
service nagios start
Code: Select all
service ndo2db startRe: Scheduled events over time piling up on "NOW"
Actually I have placed that folder(checkresults) on the ramdisk as directed by one of the performance improving tutorials, so count is as follows:
Note: the count on the actual folder you requested is 0 since they are moved to the ramdisk as previously mentioned
Strangely enough the value on the graph still shows the same on NOW (roughly above the 576 line), strange that it doesnt osccilate and just stays still...?
As for the starting and stopping sequence you mentioned, the results are as follows:
After this, value on the graph was still on the same values
Code: Select all
[root@XX checkresults]# ls -lha | wc -l
203
[root@XX checkresults]# pwd
/var/nagiosramdisk/spool/checkresults
[root@XX checkresults]# ls -lha | wc -l
93Strangely enough the value on the graph still shows the same on NOW (roughly above the 576 line), strange that it doesnt osccilate and just stays still...?
As for the starting and stopping sequence you mentioned, the results are as follows:
Code: Select all
[root@XX checkresults]# service nagios stop && service ndo2db stop && service nagios start
Stopping nagios: .done.
Stopping ndo2db: head: cannot open `/usr/local/nagios/var/ndo2db.lock' for reading: No such file or directory
done.
Starting nagios: done.
You have mail in /var/spool/mail/root
[root@XX checkresults]# service ndo2db stop
ndo2db was not running... could not stop
[root@XX checkresults]# service ndo2db start^C
[root@XX checkresults]# service nagios start
Starting nagios: done.
[root@XX checkresults]# service ndo2db start
Starting ndo2db: done.
[root@XX checkresults]# ls -lha | wc -l
173
You have mail in /var/spool/mail/root
[root@XX checkresults]# ls -lha | wc -l
15
[root@XX checkresults]# ls -lha | wc -l
19Nagios XI 2012R2.8c Running on Ubuntu 12.04 Using 99% passive checks for monitoring
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Scheduled events over time piling up on "NOW"
So it looks like you have mostly passive checks, for your active checks (I'd assume host ping checks, etc) are they actually being scheduled correctly? Are the times being displayed as normal, and checks constantly occurring, and being scheduled at their specified intervals? Can you run through the aforementioned restart procedure for NDO and nagios, and then post the output of the following?:
Code: Select all
tail -100 /usr/local/nagios/var/nagios.logRe: Scheduled events over time piling up on "NOW"
Slansing,
I have previously mentioned a problem which i think might be affecting this http://support.nagios.com/forum/viewtop ... =6&t=25790
I see alot of those entries for hosts that are actually transmitting passive checks but that i do not yet want to configure at this stage. Can this be what is causing the high number on NOW ?
The log kind of becomes useless since it is hard to find anything on it other than those lines
I have previously mentioned a problem which i think might be affecting this http://support.nagios.com/forum/viewtop ... =6&t=25790
I see alot of those entries for hosts that are actually transmitting passive checks but that i do not yet want to configure at this stage. Can this be what is causing the high number on NOW ?
The log kind of becomes useless since it is hard to find anything on it other than those lines
Nagios XI 2012R2.8c Running on Ubuntu 12.04 Using 99% passive checks for monitoring
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
Monitoring nearly 800 Passive services spread through roughly 40 machines
Running on an 8 core, KVM virtualized VM, with 15 GB of RAM and using RAMDisk
Re: Scheduled events over time piling up on "NOW"
Run the following commands and show the output:
Also, upload the "nagios.cfg" file, so that we can review it.
Code: Select all
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg | head -2
tail -50 /var/log/messages | grep ndo
service nagios status
service ndo2db statusBe sure to check out our Knowledgebase for helpful articles and solutions!