Inconsistent Nagios Report
-
fran.pastor
- Posts: 24
- Joined: Tue Nov 22, 2011 3:17 am
Inconsistent Nagios Report
Hello, I've just seen some data that I think are wrong. See attached screenshots.
Last night we had a little network problem and that caused the service check fail, but immediately, on the next check he has recovered. Something curious happened, when you make a report(trend or availability for example) this service check are down for 13 or 15 hours, why?
Last night we had a little network problem and that caused the service check fail, but immediately, on the next check he has recovered. Something curious happened, when you make a report(trend or availability for example) this service check are down for 13 or 15 hours, why?
Last edited by fran.pastor on Thu Feb 21, 2013 10:40 am, edited 1 time in total.
Re: Inconsistent Nagios Report
I wonder if this host was flapping for 13 hours ....
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
fran.pastor
- Posts: 24
- Joined: Tue Nov 22, 2011 3:17 am
Re: Inconsistent Nagios Report
No, if you look "instantanea2.png" screenshot, the service doesn't has more events, that screenshot is a "alert history" of that service. There have been no changes to the remaining hours.
is strange
is strange
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Inconsistent Nagios Report
I just want to verify that you do have flapping detection enabled correct?
-
fran.pastor
- Posts: 24
- Joined: Tue Nov 22, 2011 3:17 am
Re: Inconsistent Nagios Report
Yes slansing, we had correctly configurated flap detection, if you see the screenshot of "Alert History" look the events, if he had flapped we would see there.slansing wrote:I just want to verify that you do have flapping detection enabled correct?
Last edited by fran.pastor on Thu Feb 21, 2013 11:57 am, edited 1 time in total.
Re: Inconsistent Nagios Report
Could you post the host configuration and the main nagios.cfg file in code wrap?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
fran.pastor
- Posts: 24
- Joined: Tue Nov 22, 2011 3:17 am
Re: Inconsistent Nagios Report
I think and I see no logical explanation. Where I can post a bug?
Last edited by fran.pastor on Thu Feb 21, 2013 11:56 am, edited 1 time in total.
Re: Inconsistent Nagios Report
You are welcome to post a bug report to http://tracker.nagios.org but I am not convinced it is a bug yet. Posting the host config and the main nagios config will allow us to check over your configuration to help verify if it is indeed a bug. You are welcome to obfuscate any sensitive information from those files.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
fran.pastor
- Posts: 24
- Joined: Tue Nov 22, 2011 3:17 am
Re: Inconsistent Nagios Report
thz for support abristabrist wrote:You are welcome to post a bug report to http://tracker.nagios.org but I am not convinced it is a bug yet. Posting the host config and the main nagios config will allow us to check over your configuration to help verify if it is indeed a bug. You are welcome to obfuscate any sensitive information from those files.
Is suspect that if you look at the Trend Report, the service recovers at 00:00, but all day the check has been checked every 5 minutes checking and the result of all checks has been OK, only one CRITICAL at 00:20 +/-
This is the config result from objects.cache:
define service {
host_name Watchmouse
service_description Check Hotelopia
check_period 24x7
check_command check_watchmouse!Check Hotelopia!
contact_groups datacenter-administrators-tic
notification_period 24x7
initial_state o
check_interval 300.000000
retry_interval 300.000000
max_check_attempts 3
is_volatile 0
parallelize_check 1
active_checks_enabled 1
passive_checks_enabled 1
obsess_over_service 1
event_handler_enabled 1
low_flap_threshold 0.000000
high_flap_threshold 0.000000
flap_detection_enabled 1
flap_detection_options o,w,u,c
freshness_threshold 0
check_freshness 0
notification_options u,w,c,r,s
notifications_enabled 1
notification_interval 0.000000
first_notification_delay 0.000000
stalking_options n
process_perf_data 1
failure_prediction_enabled 1
retain_status_information 1
retain_nonstatus_information 1
}
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Inconsistent Nagios Report
One issue I noticed right away was this:
You have your check_interval and retry_interval set to 300 minutes as this is how they interpret the numbers, setting them each to 5 for example would mean the host is checked at a 5 minute interval, and then every 5 minutes after that if the state changes it will be checked again three times before generating an alert.
In this fashion it is entirely possible that it detected the state change, but never checked again until 300 minutes later, and it would have had to do this three times before finding that the host was back up and switching to an Ok state.
Code: Select all
check_interval 300.000000
retry_interval 300.000000In this fashion it is entirely possible that it detected the state change, but never checked again until 300 minutes later, and it would have had to do this three times before finding that the host was back up and switching to an Ok state.