Nagios Support Forum

Posted: **Fri May 13, 2011 9:51 am**

Since upgrade to the latest and greatest (2011R1.2) I have seen continues orphaned checks + when I tried to restart I got several SIGTERMs of the ndo module. Even a system reboot didn't help ...

What could be the case here?

Posted: **Fri May 13, 2011 9:56 am**

See if this solution helps, and it not, we'll go from there.
http://support.nagios.com/wiki/index.ph ... g_Orphaned

Posted: **Fri May 13, 2011 11:02 am**

Thats a no go there I'm afraid.

Posted: **Fri May 13, 2011 11:08 am**

Can you run the following and post the output?

Code: Select all

rm -f /usr/local/nagiosxi/var/dbmaint.lock
/usr/local/nagiosxi/cron/dbmaint.php

Posted: **Mon May 16, 2011 2:25 am**

[root@xxxxxx]# rm -f /usr/local/nagiosxi/var/dbmaint.lock
[root@xxxxxx]# /usr/local/nagiosxi/cron/dbmaint.php
CREATING: /usr/local/nagiosxi/var/dbmaint.lock
CLEANING ndoutils TABLE 'commenthistory'...
SQL: DELETE FROM nagios_commenthistory WHERE entry_time < FROM_UNIXTIME(1273994626)
CLEANING ndoutils TABLE 'processevents'...
SQL: DELETE FROM nagios_processevents WHERE event_time < FROM_UNIXTIME(1273994626)
CLEANING ndoutils TABLE 'externalcommands'...
SQL: DELETE FROM nagios_externalcommands WHERE entry_time < FROM_UNIXTIME(1305357826)
CLEANING ndoutils TABLE 'logentries'...
SQL: DELETE FROM nagios_logentries WHERE logentry_time < FROM_UNIXTIME(1273994626)
CLEANING ndoutils TABLE 'notifications'...
SQL: DELETE FROM nagios_notifications WHERE start_time < FROM_UNIXTIME(1297754626)
CLEANING ndoutils TABLE 'contactnotifications'...
SQL: DELETE FROM nagios_contactnotifications WHERE start_time < FROM_UNIXTIME(1297754626)
CLEANING ndoutils TABLE 'contactnotificationmethods'...
SQL: DELETE FROM nagios_contactnotificationmethods WHERE start_time < FROM_UNIXTIME(1297754626)
CLEANING ndoutils TABLE 'statehistory'...
SQL: DELETE FROM nagios_statehistory WHERE state_time < FROM_UNIXTIME(1273994626)
CLEANING ndoutils TABLE 'timedevents'...
SQL: DELETE FROM nagios_timedevents WHERE event_time < FROM_UNIXTIME(1305530326)
CLEANING ndoutils TABLE 'systemcommands'...
SQL: DELETE FROM nagios_systemcommands WHERE start_time < FROM_UNIXTIME(1305530326)
CLEANING ndoutils TABLE 'servicechecks'...
SQL: DELETE FROM nagios_servicechecks WHERE start_time < FROM_UNIXTIME(1305530326)
CLEANING ndoutils TABLE 'hostchecks'...
SQL: DELETE FROM nagios_hostchecks WHERE start_time < FROM_UNIXTIME(1305530326)
CLEANING ndoutils TABLE 'eventhandlers'...
SQL: DELETE FROM nagios_eventhandlers WHERE start_time < FROM_UNIXTIME(1305530326)
LASTOPT: 1305528001
INTERVAL: 60
NOW: 1305530626
OPTTIME: 1305531601
CLEANING nagiosxi TABLE 'commands'...
SQL: DELETE FROM xi_commands WHERE processing_time < 1305501826::abstime::timestamp without time zone
CLEANING nagiosxi TABLE 'events'...
SQL: DELETE FROM xi_events WHERE processing_time < 1305501826::abstime::timestamp without time zone
SQL1: SELECT xi_meta.meta_id FROM xi_meta LEFT JOIN xi_events ON xi_meta.metaobj_id=xi_events.event_id WHERE metatype_id='1' AND event_id IS NULL
SQL2: DELETE FROM xi_meta WHERE meta_id IN (SELECT xi_meta.meta_id FROM xi_meta LEFT JOIN xi_events ON xi_meta.metaobj_id=xi_events.event_id WHERE metatype_id='1' AND event_id IS NULL)
CLEANING nagiosql TABLE 'logbook'...
SQL: DELETE FROM tbl_logbook WHERE time < FROM_UNIXTIME(1305501826)
Repair Complete: FAILED TO REMOVE LOCK FILE

Ermm since I ran this, still Orphaned checks. I also notice that the External Command under Process Info is occasionally red instead of green.

Some settings for Nagioscore:

[root@xxxxx/]# egrep 'status_update|reaper|orphan' /usr/local/nagios/etc/nagios.cfg
check_for_orphaned_hosts=1
check_for_orphaned_services=1
check_result_reaper_frequency=10
max_check_result_reaper_time=30
status_update_interval=10

Posted: **Mon May 16, 2011 8:28 am**

Another update. I don't know what I did, but all of a sudden the orphaned checks seems to have disappeared... Your guess is as good as mine ...

Posted: **Mon May 16, 2011 9:14 am**

Are you using a lot of passive checks? I once had a user where all of his checks were synced to a 5mn cron job on all of his machines, and all of his results were coming in within a few seconds of each other, and some were getting dropped. Is there any chance you had a large number of passive results all come in at once?

Posted: **Tue May 17, 2011 7:14 am**

What we do use are a number of SNMP traps that are getting routed through SNMPtt. So its host that sends a heartbeat through SNMP, handled by traptranslator, the result is submitted through the eventhandler. We have quite a lot of issues with this by the way. When altering Nagios config and then restarting SNMPtraptt daemon hangs quite often for instance. So yes, this could be a cause of the strange behaviour we saw.

Posted: **Tue May 17, 2011 9:50 am**

You can try tweaking the settings below in the nagios.cfg file. This will allow nagios to process a larger batch of results more quickly. You could also try adding the nagiostats wizard to your localhost checks to keep an eye on passive results coming in in large batches to see if there's a relationship.

Code: Select all

check_result_reaper_frequency=5
max_check_result_reaper_time=15

Nagiostats Wizard
http://exchange.nagios.org/directory/Ad ... rd/details

Nagios Support Forum

Constant Orphaned checks

Constant Orphaned checks

Re: Constant Orphaned checks

Re: Constant Orphaned checks

Re: Constant Orphaned checks

Re: Constant Orphaned checks

Re: Constant Orphaned checks

Re: Constant Orphaned checks

Re: Constant Orphaned checks

Re: Constant Orphaned checks