host check orphaned

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: host check orphaned

Post by jdalrymple »

gearmand is the key bit when modifying the neb.conf file.

Check and see if it's running. If it is, then something definitely went haywire in the config. Under whatever it was you did then restart everything again. Let's get our green checkmarks back before trying any fancy new stuff :)
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: host check orphaned

Post by CFT6Server »

OK. Got the XI back up and running again. Not sure why but took a few tries before Nagios service would stick.
I've reverted most of the settings and modifications. Here are the changes that are in place, so far the hosts are fine. I do suspect that gearman top was not updating fast enough or showing the correct amount of workers being used, or perhaps just a time out issue.

1. nagios.cfg - increased timeouts of host and service checks from 30 to 60.
2. mod_gearman_worker.conf - on all worker nodes changed job_timeout from 60 to 120.
3. mod_gearman_worker.conf - on all worker nodes increased min-worker to 100. ( I don't think it needs that many, but from observing the logs, each worker does go through quiet a bit of checks, especially the main XI box)

Noticed a section in the NEB configure that mentions why these messages comes up...
# The Mod-Gearman NEB module will submit a fake result for orphaned host
# checks with a message saying there is no worker running for this
# queue. Use this option to get better reporting results, otherwise your
# hosts will keep their last state as long as there is no worker
# running.
# Default: yes
orphan_host_checks=yes
So these were fake results put in place as if there are "no workers", so increasing the worker and timeout should help this issue we are having.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: host check orphaned

Post by WillemDH »

Hello,

Sorry to jump in. I did not read the whole thread, but wanted to mention we are also seeing orphaned hosts about once every two weeks. This started happening from the moment we offloaded our mrtg checks with mod gearman. These symptoms seemed to always start with multiple nagios instances running.
To detect the issue fast, I made a small script that checks if multiple nagios instances are running:

Code: Select all

#!/bin/bash

# Script name:  check_nagios_instances.sh
# Version:      v0.6.2
# Author:       Willem D'Haese
# Created on:   02/06/2015
# Purpose:      Bash nagios plugin to alert if multiple instances of nagios are running in order to prevent orphaned checks
# Recent history:
#       02/06/2015 => Creation date
# Copyright:
#       This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published
#       by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed
#       in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
#       PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public
#       License along with this program.  If not, see <http://www.gnu.org/licenses/>.

NumNagiosProc=`/usr/bin/pgrep -P 1 nagios | wc -l`

if (( $NumNagiosProc < 1)) ;then
        echo "No Nagios processes found.."
        exit 2
elif (( $NumNagiosProc > 1 )) ;then
        echo "$NumNagiosProc Nagios processes found. Please execute 'killall -9 nagios && service nagios restart'"
        exit 2
else
        echo "OK: One Nagios process found."
fi
I hope it somehow helps you. If not, please ignore me.

Grtz

Willem
Nagios XI 5.8.1
https://outsideit.net
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: host check orphaned

Post by jdalrymple »

WillemDH wrote:please ignore me.
Never - thanks for the help WillemDH.

Can you confirm or deny similar behavior or even configuration CFT6Server?
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: host check orphaned

Post by CFT6Server »

So far, my mentioned fixes worked and did not require any additional scripting. Basically even though gearman_top was showing that the workers are available, I think it is timing out on returning the results perhaps. Or it just takes too long and Nagios thinks that the check failed. Increasing the workers and timeouts seems to have stabilized the issue and I am no longer getting host errors with the orphaned message.

Thanks WillemDH for the script and suggestion, but for us monitoring the Nagios instance and restarting the services isn't really a viable fix. Since the gearman boxes will not have nagios instances running on them.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: host check orphaned

Post by WillemDH »

(The script is meant to run on the Nagios server.) Good luck in solving your issue CFT!
Nagios XI 5.8.1
https://outsideit.net
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: host check orphaned

Post by tgriep »

CFT6Server,
Let us know if you need any more help on this post or that it can be closed.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked