Constant Orphaned checks

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
tairline
Posts: 25
Joined: Mon Sep 20, 2010 2:33 am

Constant Orphaned checks

Post by tairline »

Since upgrade to the latest and greatest (2011R1.2) I have seen continues orphaned checks + when I tried to restart I got several SIGTERMs of the ndo module. Even a system reboot didn't help ...

What could be the case here?
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Constant Orphaned checks

Post by mguthrie »

See if this solution helps, and it not, we'll go from there.
http://support.nagios.com/wiki/index.ph ... g_Orphaned
tairline
Posts: 25
Joined: Mon Sep 20, 2010 2:33 am

Re: Constant Orphaned checks

Post by tairline »

Thats a no go there I'm afraid.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Constant Orphaned checks

Post by mguthrie »

Can you run the following and post the output?

Code: Select all

rm -f /usr/local/nagiosxi/var/dbmaint.lock
/usr/local/nagiosxi/cron/dbmaint.php
tairline
Posts: 25
Joined: Mon Sep 20, 2010 2:33 am

Re: Constant Orphaned checks

Post by tairline »

[root@xxxxxx]# rm -f /usr/local/nagiosxi/var/dbmaint.lock
[root@xxxxxx]# /usr/local/nagiosxi/cron/dbmaint.php
CREATING: /usr/local/nagiosxi/var/dbmaint.lock
CLEANING ndoutils TABLE 'commenthistory'...
SQL: DELETE FROM nagios_commenthistory WHERE entry_time < FROM_UNIXTIME(1273994626)
CLEANING ndoutils TABLE 'processevents'...
SQL: DELETE FROM nagios_processevents WHERE event_time < FROM_UNIXTIME(1273994626)
CLEANING ndoutils TABLE 'externalcommands'...
SQL: DELETE FROM nagios_externalcommands WHERE entry_time < FROM_UNIXTIME(1305357826)
CLEANING ndoutils TABLE 'logentries'...
SQL: DELETE FROM nagios_logentries WHERE logentry_time < FROM_UNIXTIME(1273994626)
CLEANING ndoutils TABLE 'notifications'...
SQL: DELETE FROM nagios_notifications WHERE start_time < FROM_UNIXTIME(1297754626)
CLEANING ndoutils TABLE 'contactnotifications'...
SQL: DELETE FROM nagios_contactnotifications WHERE start_time < FROM_UNIXTIME(1297754626)
CLEANING ndoutils TABLE 'contactnotificationmethods'...
SQL: DELETE FROM nagios_contactnotificationmethods WHERE start_time < FROM_UNIXTIME(1297754626)
CLEANING ndoutils TABLE 'statehistory'...
SQL: DELETE FROM nagios_statehistory WHERE state_time < FROM_UNIXTIME(1273994626)
CLEANING ndoutils TABLE 'timedevents'...
SQL: DELETE FROM nagios_timedevents WHERE event_time < FROM_UNIXTIME(1305530326)
CLEANING ndoutils TABLE 'systemcommands'...
SQL: DELETE FROM nagios_systemcommands WHERE start_time < FROM_UNIXTIME(1305530326)
CLEANING ndoutils TABLE 'servicechecks'...
SQL: DELETE FROM nagios_servicechecks WHERE start_time < FROM_UNIXTIME(1305530326)
CLEANING ndoutils TABLE 'hostchecks'...
SQL: DELETE FROM nagios_hostchecks WHERE start_time < FROM_UNIXTIME(1305530326)
CLEANING ndoutils TABLE 'eventhandlers'...
SQL: DELETE FROM nagios_eventhandlers WHERE start_time < FROM_UNIXTIME(1305530326)
LASTOPT: 1305528001
INTERVAL: 60
NOW: 1305530626
OPTTIME: 1305531601
CLEANING nagiosxi TABLE 'commands'...
SQL: DELETE FROM xi_commands WHERE processing_time < 1305501826::abstime::timestamp without time zone
CLEANING nagiosxi TABLE 'events'...
SQL: DELETE FROM xi_events WHERE processing_time < 1305501826::abstime::timestamp without time zone
SQL1: SELECT xi_meta.meta_id FROM xi_meta LEFT JOIN xi_events ON xi_meta.metaobj_id=xi_events.event_id WHERE metatype_id='1' AND event_id IS NULL
SQL2: DELETE FROM xi_meta WHERE meta_id IN (SELECT xi_meta.meta_id FROM xi_meta LEFT JOIN xi_events ON xi_meta.metaobj_id=xi_events.event_id WHERE metatype_id='1' AND event_id IS NULL)
CLEANING nagiosql TABLE 'logbook'...
SQL: DELETE FROM tbl_logbook WHERE time < FROM_UNIXTIME(1305501826)
Repair Complete: FAILED TO REMOVE LOCK FILE

Ermm since I ran this, still Orphaned checks. I also notice that the External Command under Process Info is occasionally red instead of green.

Some settings for Nagioscore:

[root@xxxxx/]# egrep 'status_update|reaper|orphan' /usr/local/nagios/etc/nagios.cfg
check_for_orphaned_hosts=1
check_for_orphaned_services=1
check_result_reaper_frequency=10
max_check_result_reaper_time=30
status_update_interval=10
tairline
Posts: 25
Joined: Mon Sep 20, 2010 2:33 am

Re: Constant Orphaned checks

Post by tairline »

Another update. I don't know what I did, but all of a sudden the orphaned checks seems to have disappeared... Your guess is as good as mine ...
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Constant Orphaned checks

Post by mguthrie »

Are you using a lot of passive checks? I once had a user where all of his checks were synced to a 5mn cron job on all of his machines, and all of his results were coming in within a few seconds of each other, and some were getting dropped. Is there any chance you had a large number of passive results all come in at once?
tairline
Posts: 25
Joined: Mon Sep 20, 2010 2:33 am

Re: Constant Orphaned checks

Post by tairline »

What we do use are a number of SNMP traps that are getting routed through SNMPtt. So its host that sends a heartbeat through SNMP, handled by traptranslator, the result is submitted through the eventhandler. We have quite a lot of issues with this by the way. When altering Nagios config and then restarting SNMPtraptt daemon hangs quite often for instance. So yes, this could be a cause of the strange behaviour we saw.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Constant Orphaned checks

Post by mguthrie »

You can try tweaking the settings below in the nagios.cfg file. This will allow nagios to process a larger batch of results more quickly. You could also try adding the nagiostats wizard to your localhost checks to keep an eye on passive results coming in in large batches to see if there's a relationship.

Code: Select all

check_result_reaper_frequency=5
max_check_result_reaper_time=15
Nagiostats Wizard
http://exchange.nagios.org/directory/Ad ... rd/details
Locked