Strange state change issue

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
sspaise
Posts: 14
Joined: Wed Mar 25, 2015 11:28 pm

Strange state change issue

Post by sspaise »

Hi everyone,

I have a nagios server using a mix of gearman and NCSA checking. I'm having this strange problem, wondering if anyone has an idea as to why. Here's the issue:

1. A service turns into a WARNING or CRITICAL state
2. A Host check is received, OK (for gearman check only)
3. Ten seconds after the service failure, host reports CRITICAL (Down) - (host never actually goes down)
4. 50 seconds after host going critical, it checks in and reports OK again

Log (gearman passive check):
[1427343690] PASSIVE SERVICE CHECK: vm1-testvm;Service-Asterisk;2;NOK - Asterisk Service Down!!
[1427343690] SERVICE ALERT: vm1-testvm;Service-Asterisk;CRITICAL;HARD;1;NOK - Asterisk Service Down!!
[1427343690] PASSIVE HOST CHECK: vm1-testvm;0;OK
[1427343700] HOST ALERT: vm1-testvm;DOWN;HARD;1;CRITICAL: Host not reported in - probably down
[1427343750] PASSIVE HOST CHECK: vm1-testvm;0;OK
[1427343750] HOST ALERT: vm1-testvm;UP;HARD;1;OK

Log (NCSA check):
[1427344850] SERVICE ALERT: vm2-testvm;Memory;WARNING;HARD;1;WARNING: There have been no recent passive updates!
[1427344860] HOST ALERT: vm2-testvm;DOWN;HARD;1;CRITICAL: Host not reported in - probably down


This is happening for all hosts, and is becoming a pain what with 4 emails for every host when a service changes state. Any clues would be greatly appreciated.

Regards,
sspaise
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: Strange state change issue

Post by lmiltchev »

Do you have freshness enabled? Can you show us the "vm1-testvm" and "vm2-testvm" (and any other related) configs?
Be sure to check out our Knowledgebase for helpful articles and solutions!
sspaise
Posts: 14
Joined: Wed Mar 25, 2015 11:28 pm

Re: Strange state change issue

Post by sspaise »

Hi lmiltchev,

We do indeed have freshness checking enabled, both for the host and some of the services. Below I have included the configuration for "vm1-testvm" and its related configuration:

================
passive-host-invade
================

define host{
name passive-host-invade
use passive-host
freshness_threshold 600
contact_groups +invade
register 0
}

=====================
passive-service-invade.cfg
=====================

define service{
name passive-service-invade
use passive-service
contact_groups +invade
register 0
}


I've uploaded the "vm1-testvm.cfg" and "nagios.cfg" to this thread, if I can provide anything further just let me know!
Attachments
nagios.cfg
nagios.cfg
(43.74 KiB) Downloaded 259 times
vm1-testvm.cfg
vm1-testvm.cfg
(8.46 KiB) Downloaded 253 times
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Strange state change issue

Post by tmcdonald »

I am curious to see what you have in your passive-host and passive-service templates. Can you post them?
Former Nagios employee
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Strange state change issue

Post by tgriep »

Edit your nagios.cfg file and change

Code: Select all

check_host_freshness=0
to

Code: Select all

check_host_freshness=1
Then restart Nagios
Be sure to check out our Knowledgebase for helpful articles and solutions!
sspaise
Posts: 14
Joined: Wed Mar 25, 2015 11:28 pm

Re: Strange state change issue

Post by sspaise »

Hi all,

@tmcdonald, passive host and service templates were posted in my reply to lmiltchev =)

@tgriep, thank you, it looks like this has resolved the issue. Going to do some further testing just to be sure!
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Strange state change issue

Post by tmcdonald »

sspaise wrote:@tmcdonald, passive host and service templates were posted in my reply to lmiltchev =)
*cough* I knew that :) Was just testing you.

I'll go ahead and close this up now.

Edit: Unlocking per OP request.
Former Nagios employee
sspaise
Posts: 14
Joined: Wed Mar 25, 2015 11:28 pm

Re: Strange state change issue

Post by sspaise »

Hi All,

@tmcdonald, thanks for the unlock sir!

So unfortunately after further testing it was apparent that changing the "check_host_freshness" value in the nagios config made no difference.

I was reading some of the documentation the other day and came across "on demand checks", this got me thinking.

Currently we are seeing a service check go into a failed state, then 10 seconds later the host reports being DOWN. Based on what I was reading about on-demand checks I believe this to be caused by the nagios logic, that says, if a service for said host goes into a failed state, force a host check to see if the host has gone down, the active check fails and reports host being DOWN.

Does anyone know much about the on-demand checks, and can the host check be disabled if a service changes state? (we only want to use passive checks, no active)
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Strange state change issue

Post by ssax »

The performance of on-demand host checks can be significantly improved by implementing the use of cached checks, which allow Nagios to forgo executing a host check if it determines a relatively recent check result will do instead.
Have you looked into cached checks? http://nagios.sourceforge.net/docs/nagi ... hecks.html
Locked