Page 2 of 2
Re: Scheduled events over time piling up on "NOW"
Posted: Wed Mar 19, 2014 7:45 am
by johndoe
Code: Select all
[root@XXX ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg | head -2
Nagios Core 3.5.0
[root@XXX ~]# tail -50 /var/log/messages | grep ndo
[root@XXX ~]# service nagios status
nagios (pid 2584) is running...
[root@XXX ~]# service ndo2db status
ndo2db (pid 2618) is running...
File sent via PM to avoid any sensitive info disclosure, checked and there seemed to be none but can never be too careful. Feel free to suggest any other improvements on the actual configs.
Re: Scheduled events over time piling up on "NOW"
Posted: Wed Mar 19, 2014 4:24 pm
by lmiltchev
I didn't see anything weird in the nagios.cfg file, besides the fact that the sections were out of order. Anyway, go to:
Admin->System Profile->Download Profile
save and PM me the "profile.zip" file.
Re: Scheduled events over time piling up on "NOW"
Posted: Thu Mar 20, 2014 1:15 pm
by lmiltchev
It seems like you have a crashed table in the nagiosql database.
140320 13:20:01 [ERROR] /usr/libexec/mysqld: Table './nagiosql/tbl_logbook' is marked as crashed and last (automatic?) repair failed
Run the following commands:
Code: Select all
cd /usr/local/nagiosxi/scripts
./repairmysql.sh nagios
./repairmysql.sh nagiosql
service nagios stop
killall nagios
service ndo2db stop
service ndo2db start
service nagios start
Check if nagios service is running:
Re: Scheduled events over time piling up on "NOW"
Posted: Tue Mar 25, 2014 7:25 am
by johndoe
Did that, same problem...
Re: Scheduled events over time piling up on "NOW"
Posted: Tue Mar 25, 2014 4:59 pm
by scottwilkerson
johndoe,
I had a revisit of the code that makes the up the "Monitoring Engine Event Queue" and what you are seeing is likely just because of how the data is queried and the fact that you have lots of passive checks, and almost all the hosts/services are reporting in very frequently and the "next_check" time is likely off into the future further than the likely time that a real check will come in...
This is gonna be ugly, but this is what the SQL looks like that is used to pull data and then be massaged into the XML that populates the graph
Code: Select all
SELECT COUNT(*) AS total_events,next_check, NOW() as time_now,
TIMESTAMPDIFF(SECOND,NOW(),next_check) AS seconds_from_now,
(TIMESTAMPDIFF(SECOND,NOW(),next_check) DIV 10) AS bucket
FROM nagios_hoststatus
WHERE TRUE
AND (TIMESTAMPDIFF(SECOND,NOW(),next_check) < 300)
AND instance_id = '1'
AND UNIX_TIMESTAMP(next_check) != 0
GROUP BY instance_id, bucket
UNION
SELECT COUNT(*) AS total_events,next_check, NOW() as time_now,
TIMESTAMPDIFF(SECOND,NOW(),next_check) AS seconds_from_now,
(TIMESTAMPDIFF(SECOND,NOW(),next_check) DIV 10) AS bucket
FROM nagios_servicestatus
WHERE TRUE
AND (TIMESTAMPDIFF(SECOND,NOW(),next_check) < 300)
AND instance_id = '1'
AND UNIX_TIMESTAMP(next_check) != 0
GROUP BY instance_id, bucket
ORDER BY bucket ASC LIMIT 10000
Long story short, I think this is just a product of your environment. you might be able to massage the graph by reducing the check_interval on your checks (even though you don't to active checks)