Page 1 of 1
Service Incosistent state
Posted: Fri Jun 23, 2017 4:17 am
by amprantino
Hello,
these are the states appearing for the host
Service definiton:
Code: Select all
define service{
use generic-service
host_name SQL-Server
service_description Disk-D-RM
servicegroups XXXXXXXX
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 30
retry_check_interval 1
flap_detection_enabled 0
contact_groups sys-admins,db-admins
notification_interval 240
notification_period 24x7
notification_options c,r
check_command check_snmp_storage!XXXXXXXXXXX!160!150!"D:"!-T bl -G
}
On Services > it's flapping.
Inside the service, no flapping
How is this possible?
Any ideas why notification wasn't sent?
Re: Service Incosistent state
Posted: Fri Jun 23, 2017 12:08 pm
by dwhitfield
I'm not sure what you mean about the notification. You mean you didn't get a critical notification?
As for flapping, you have flapping detection turned off. You have a couple of options.
1) Turn flapping detection on, and force enough checks for it to stop flapping. Once it's not flapping, turn flapping detection off.
2) stop nagios and delete your retention.dat, restart nagios. You lose a lot of information this way, so it doesn't seem like the best option to me
Please let us know if those do not work for you.
Re: Service Incosistent state
Posted: Fri Jun 23, 2017 2:47 pm
by amprantino
Yes, I never received a critical notification.
flap_detection_enabled *: This directive is used to determine whether or not flap detection is enabled for this host. More information on flap detection can be found here. Values: 0 = disable host flap detection, 1 = enable host flap detection.
1) Although "flap_detection_enabled = 0" the service is detected as flapping!!! Why ? It should never enter this state!
2) If service is flapping, and "flap_detection_enabled = 0" is configured afterwards, the service isn't allowed to exit the flapping state?
Obviously retention.dat cannot be deleted
Re: Service Incosistent state
Posted: Fri Jun 23, 2017 3:46 pm
by dwhitfield
amprantino wrote:Although "flap_detection_enabled = 0" the service is detected as flapping!!! Why ? It should never enter this state!
If it is flapping before detection is disabled, it will keep that state.
amprantino wrote:
2) If service is flapping, and "flap_detection_enabled = 0" is configured afterwards, the service isn't allowed to exit the flapping state?
That's correct. Once disabled, it is unable to detect exiting the flapping state.
Even if flap_detection_enabled = 0, if it is in a flapping state before the change, notifications may still be suppressed.
Also, what's the output of
ps -aef | grep nagios.cfg?
Please post your objects.cache, status.dat, and retention.dat or PM them if there are security concerns. If you PM, please make sure you update the thread so it comes back up on the support dashboard.
Re: Service Incosistent state
Posted: Fri Jun 23, 2017 4:04 pm
by amprantino
I have disabled notification during flapping.
Thank you
Re: Service Incosistent state
Posted: Fri Jun 23, 2017 4:35 pm
by dwhitfield
I edited my last post after discussing the issue with another tech. You may not have seen my edits.
What's the output of ps -aef | grep nagios.cfg?
Please post your objects.cache, status.dat, and retention.dat or PM them if there are security concerns. If you PM, please make sure you update the thread so it comes back up on the support dashboard.
Additionally, what is the current status of the situation? A couple of us were not sure if your last post was saying the issue was resolved or not.
Re: Service Incosistent state
Posted: Fri Jun 23, 2017 4:43 pm
by amprantino
An explanation could be that the service was flapping before someone disabled flap detection.
So the state was flapping; I didn't get a notification because I have disabled service notification during flapping.
I will send you the .dat files tomorrow in a PM
The current state is critical + flapping. (I send you a state update when I send you the dat files)
Re: Service Incosistent state
Posted: Fri Jun 23, 2017 4:51 pm
by dwhitfield
amprantino wrote:An explanation could be that the service was flapping before someone disabled flap detection.
Yes, that was what I meant to suggest in my first post. Apologies for the confusion.
I still think your quickest resolution is to turn flapping detection on, and force enough checks for it to stop flapping. Once it's not flapping, turn flapping detection off (assuming you don't want it).
Re: Service Incosistent state
Posted: Fri Jun 23, 2017 4:52 pm
by amprantino
Is there a way to find all flapping services that have flap detection off? (=all services stack to flap state)
Re: Service Incosistent state
Posted: Fri Jun 23, 2017 4:59 pm
by dwhitfield
You can just use grep -R "flap_detection_enabled = 0" in your cfg directory. And then match that with the flapping states in your status.dat.
One thing you could too that was not mentioned was that rather than deleting retention.dat, you could just edit it to remove the flapping state. ***Make sure nagios is off when you do this though.***