Page 1 of 1

Service Incosistent state

Posted: Fri Jun 23, 2017 4:17 am
by amprantino
Hello,

these are the states appearing for the host

Service definiton:

Code: Select all

define service{
        use                             generic-service
        host_name                       SQL-Server
        service_description             Disk-D-RM
        servicegroups                  XXXXXXXX
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           30
        retry_check_interval            1
        flap_detection_enabled          0
        contact_groups                  sys-admins,db-admins
        notification_interval           240
        notification_period             24x7
        notification_options            c,r
        check_command                   check_snmp_storage!XXXXXXXXXXX!160!150!"D:"!-T bl -G
        }
Snap3.png
Snap4.png
On Services > it's flapping.
Inside the service, no flapping

How is this possible?
Any ideas why notification wasn't sent?

Re: Service Incosistent state

Posted: Fri Jun 23, 2017 12:08 pm
by dwhitfield
I'm not sure what you mean about the notification. You mean you didn't get a critical notification?

As for flapping, you have flapping detection turned off. You have a couple of options.

1) Turn flapping detection on, and force enough checks for it to stop flapping. Once it's not flapping, turn flapping detection off.
2) stop nagios and delete your retention.dat, restart nagios. You lose a lot of information this way, so it doesn't seem like the best option to me

Please let us know if those do not work for you.

Re: Service Incosistent state

Posted: Fri Jun 23, 2017 2:47 pm
by amprantino
Yes, I never received a critical notification.

Code: Select all

        flap_detection_enabled          0
flap_detection_enabled *: This directive is used to determine whether or not flap detection is enabled for this host. More information on flap detection can be found here. Values: 0 = disable host flap detection, 1 = enable host flap detection.

1) Although "flap_detection_enabled = 0" the service is detected as flapping!!! Why ? It should never enter this state!

2) If service is flapping, and "flap_detection_enabled = 0" is configured afterwards, the service isn't allowed to exit the flapping state?


Obviously retention.dat cannot be deleted

Re: Service Incosistent state

Posted: Fri Jun 23, 2017 3:46 pm
by dwhitfield
amprantino wrote:Although "flap_detection_enabled = 0" the service is detected as flapping!!! Why ? It should never enter this state!
If it is flapping before detection is disabled, it will keep that state.
amprantino wrote: 2) If service is flapping, and "flap_detection_enabled = 0" is configured afterwards, the service isn't allowed to exit the flapping state?
That's correct. Once disabled, it is unable to detect exiting the flapping state.


Even if flap_detection_enabled = 0, if it is in a flapping state before the change, notifications may still be suppressed.

Also, what's the output of ps -aef | grep nagios.cfg?

Please post your objects.cache, status.dat, and retention.dat or PM them if there are security concerns. If you PM, please make sure you update the thread so it comes back up on the support dashboard.

Re: Service Incosistent state

Posted: Fri Jun 23, 2017 4:04 pm
by amprantino
I have disabled notification during flapping.

Thank you

Re: Service Incosistent state

Posted: Fri Jun 23, 2017 4:35 pm
by dwhitfield
I edited my last post after discussing the issue with another tech. You may not have seen my edits.

What's the output of ps -aef | grep nagios.cfg?

Please post your objects.cache, status.dat, and retention.dat or PM them if there are security concerns. If you PM, please make sure you update the thread so it comes back up on the support dashboard.

Additionally, what is the current status of the situation? A couple of us were not sure if your last post was saying the issue was resolved or not.

Re: Service Incosistent state

Posted: Fri Jun 23, 2017 4:43 pm
by amprantino
An explanation could be that the service was flapping before someone disabled flap detection.
So the state was flapping; I didn't get a notification because I have disabled service notification during flapping.

I will send you the .dat files tomorrow in a PM
The current state is critical + flapping. (I send you a state update when I send you the dat files)

Re: Service Incosistent state

Posted: Fri Jun 23, 2017 4:51 pm
by dwhitfield
amprantino wrote:An explanation could be that the service was flapping before someone disabled flap detection.
Yes, that was what I meant to suggest in my first post. Apologies for the confusion.

I still think your quickest resolution is to turn flapping detection on, and force enough checks for it to stop flapping. Once it's not flapping, turn flapping detection off (assuming you don't want it).

Re: Service Incosistent state

Posted: Fri Jun 23, 2017 4:52 pm
by amprantino
Is there a way to find all flapping services that have flap detection off? (=all services stack to flap state)

Re: Service Incosistent state

Posted: Fri Jun 23, 2017 4:59 pm
by dwhitfield
You can just use grep -R "flap_detection_enabled = 0" in your cfg directory. And then match that with the flapping states in your status.dat.

One thing you could too that was not mentioned was that rather than deleting retention.dat, you could just edit it to remove the flapping state. ***Make sure nagios is off when you do this though.***