Page 1 of 1
Constant Orphaned checks
Posted: Fri May 13, 2011 9:51 am
by tairline
Since upgrade to the latest and greatest (2011R1.2) I have seen continues orphaned checks + when I tried to restart I got several SIGTERMs of the ndo module. Even a system reboot didn't help ...
What could be the case here?
Re: Constant Orphaned checks
Posted: Fri May 13, 2011 9:56 am
by mguthrie
See if this solution helps, and it not, we'll go from there.
http://support.nagios.com/wiki/index.ph ... g_Orphaned
Re: Constant Orphaned checks
Posted: Fri May 13, 2011 11:02 am
by tairline
Thats a no go there I'm afraid.
Re: Constant Orphaned checks
Posted: Fri May 13, 2011 11:08 am
by mguthrie
Can you run the following and post the output?
Code: Select all
rm -f /usr/local/nagiosxi/var/dbmaint.lock
/usr/local/nagiosxi/cron/dbmaint.php
Re: Constant Orphaned checks
Posted: Mon May 16, 2011 2:25 am
by tairline
[root@xxxxxx]# rm -f /usr/local/nagiosxi/var/dbmaint.lock
[root@xxxxxx]# /usr/local/nagiosxi/cron/dbmaint.php
CREATING: /usr/local/nagiosxi/var/dbmaint.lock
CLEANING ndoutils TABLE 'commenthistory'...
SQL: DELETE FROM nagios_commenthistory WHERE entry_time < FROM_UNIXTIME(1273994626)
CLEANING ndoutils TABLE 'processevents'...
SQL: DELETE FROM nagios_processevents WHERE event_time < FROM_UNIXTIME(1273994626)
CLEANING ndoutils TABLE 'externalcommands'...
SQL: DELETE FROM nagios_externalcommands WHERE entry_time < FROM_UNIXTIME(1305357826)
CLEANING ndoutils TABLE 'logentries'...
SQL: DELETE FROM nagios_logentries WHERE logentry_time < FROM_UNIXTIME(1273994626)
CLEANING ndoutils TABLE 'notifications'...
SQL: DELETE FROM nagios_notifications WHERE start_time < FROM_UNIXTIME(1297754626)
CLEANING ndoutils TABLE 'contactnotifications'...
SQL: DELETE FROM nagios_contactnotifications WHERE start_time < FROM_UNIXTIME(1297754626)
CLEANING ndoutils TABLE 'contactnotificationmethods'...
SQL: DELETE FROM nagios_contactnotificationmethods WHERE start_time < FROM_UNIXTIME(1297754626)
CLEANING ndoutils TABLE 'statehistory'...
SQL: DELETE FROM nagios_statehistory WHERE state_time < FROM_UNIXTIME(1273994626)
CLEANING ndoutils TABLE 'timedevents'...
SQL: DELETE FROM nagios_timedevents WHERE event_time < FROM_UNIXTIME(1305530326)
CLEANING ndoutils TABLE 'systemcommands'...
SQL: DELETE FROM nagios_systemcommands WHERE start_time < FROM_UNIXTIME(1305530326)
CLEANING ndoutils TABLE 'servicechecks'...
SQL: DELETE FROM nagios_servicechecks WHERE start_time < FROM_UNIXTIME(1305530326)
CLEANING ndoutils TABLE 'hostchecks'...
SQL: DELETE FROM nagios_hostchecks WHERE start_time < FROM_UNIXTIME(1305530326)
CLEANING ndoutils TABLE 'eventhandlers'...
SQL: DELETE FROM nagios_eventhandlers WHERE start_time < FROM_UNIXTIME(1305530326)
LASTOPT: 1305528001
INTERVAL: 60
NOW: 1305530626
OPTTIME: 1305531601
CLEANING nagiosxi TABLE 'commands'...
SQL: DELETE FROM xi_commands WHERE processing_time < 1305501826::abstime::timestamp without time zone
CLEANING nagiosxi TABLE 'events'...
SQL: DELETE FROM xi_events WHERE processing_time < 1305501826::abstime::timestamp without time zone
SQL1: SELECT xi_meta.meta_id FROM xi_meta LEFT JOIN xi_events ON xi_meta.metaobj_id=xi_events.event_id WHERE metatype_id='1' AND event_id IS NULL
SQL2: DELETE FROM xi_meta WHERE meta_id IN (SELECT xi_meta.meta_id FROM xi_meta LEFT JOIN xi_events ON xi_meta.metaobj_id=xi_events.event_id WHERE metatype_id='1' AND event_id IS NULL)
CLEANING nagiosql TABLE 'logbook'...
SQL: DELETE FROM tbl_logbook WHERE time < FROM_UNIXTIME(1305501826)
Repair Complete: FAILED TO REMOVE LOCK FILE
Ermm since I ran this, still Orphaned checks. I also notice that the External Command under Process Info is occasionally red instead of green.
Some settings for Nagioscore:
[root@xxxxx/]# egrep 'status_update|reaper|orphan' /usr/local/nagios/etc/nagios.cfg
check_for_orphaned_hosts=1
check_for_orphaned_services=1
check_result_reaper_frequency=10
max_check_result_reaper_time=30
status_update_interval=10
Re: Constant Orphaned checks
Posted: Mon May 16, 2011 8:28 am
by tairline
Another update. I don't know what I did, but all of a sudden the orphaned checks seems to have disappeared... Your guess is as good as mine ...
Re: Constant Orphaned checks
Posted: Mon May 16, 2011 9:14 am
by mguthrie
Are you using a lot of passive checks? I once had a user where all of his checks were synced to a 5mn cron job on all of his machines, and all of his results were coming in within a few seconds of each other, and some were getting dropped. Is there any chance you had a large number of passive results all come in at once?
Re: Constant Orphaned checks
Posted: Tue May 17, 2011 7:14 am
by tairline
What we do use are a number of SNMP traps that are getting routed through SNMPtt. So its host that sends a heartbeat through SNMP, handled by traptranslator, the result is submitted through the eventhandler. We have quite a lot of issues with this by the way. When altering Nagios config and then restarting SNMPtraptt daemon hangs quite often for instance. So yes, this could be a cause of the strange behaviour we saw.
Re: Constant Orphaned checks
Posted: Tue May 17, 2011 9:50 am
by mguthrie
You can try tweaking the settings below in the nagios.cfg file. This will allow nagios to process a larger batch of results more quickly. You could also try adding the nagiostats wizard to your localhost checks to keep an eye on passive results coming in in large batches to see if there's a relationship.
Code: Select all
check_result_reaper_frequency=5
max_check_result_reaper_time=15
Nagiostats Wizard
http://exchange.nagios.org/directory/Ad ... rd/details