[Nagios-devel] Confusion over current_state and last_hard_state in neb status

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[Nagios-devel] Confusion over current_state and last_hard_state in neb status

Post by Guest »

I'm capturing host status and service status callbacks in a neb module,
and I'm not really clear about the logic of how current_state and
last_hard_state get set. Hopefully somebody else is. Below are some table
snippets to show what I'm seeing.

The columns are, in order:
- the unique id of this service/host check,
- when this state started,
- the seconds the states remained unchanged (null when they're the current values),
- the soft_state,
- the last_hard_state,
- the current_attempt,
- the plugin_output.

Note that the current_attempt value gets updated in place when the states
don't change, instead of inserting a new row with a the same states but a
different current_attempt, as you might expect. Also, the plugin_output
value is the value at the start of the state, not after the most recent
attempt.

Clear as mud? Cool, here we go.....




484 | 2004-11-10 15:50:45-08 | | 0 | 0 | 1 | PING OK - Packet loss = 0%, RTA = 88.40 ms

This is pretty obvious and straightforward. A ping check succeeded on its
first try, and so the current_state is 0. It hasn't had any failures,
either, so the last_hard_state is also 0.


113 | 2004-11-10 15:06:59-08 | 29346 | 0 | 0 | 1 | PING OK - Packet loss = 0%, RTA = 32.21 ms
113 | 2004-11-10 23:16:05-08 | 86 | 1 | 0 | 1 | PING WARNING - Packet loss = 0%, RTA = 250.70 ms
113 | 2004-11-10 23:17:31-08 | | 0 | 0 | 1 | PING OK - Packet loss = 0%, RTA = 144.14 ms

Another simple example to verify foundations. We have a ping check that's
working fine for a long time, then blips with a warning, but 86 seconds
and another try later, we return to an ok state. We know there was only 1
try that resulted in a soft error state, because otherwise current_attempt
would have been greater than 1 on that middle row.


141 | 2004-11-10 15:32:44-08 | 54141 | 0 | 0 | 1 | PING OK - Packet loss = 0%, RTA = 51.94 ms
141 | 2004-11-11 06:35:05-08 | 59 | 1 | 0 | 1 | PING WARNING - Packet loss = 0%, RTA = 334.69 ms
141 | 2004-11-11 06:36:04-08 | 10801 | 0 | 0 | 1 | PING OK - Packet loss = 0%, RTA = 115.25 ms
141 | 2004-11-11 09:36:05-08 | 123 | 1 | 0 | 2 | PING WARNING - Packet loss = 0%, RTA = 280.24 ms
141 | 2004-11-11 09:38:08-08 | | 0 | 0 | 1 | PING OK - Packet loss = 0%, RTA = 51.70 ms

Here's a ping that starts off work, blips with a warning, returns to a
working state, blips twice with a warning, then returns again to a working
state before max_attempts=3 is reached. Nothing special here.

Enough of the groundwork. Here's where the confusion starts:


5655 | 2004-11-10 15:03:30-08 | 58563 | 0 | 0 | 1 | HTTP ok: HTTP/1.1 200 Channel Listing - 0.041 second response time
5655 | 2004-11-11 07:19:33-08 | 553 | 2 | 0 | 5 | Socket timeout after 30 seconds
5655 | 2004-11-11 07:28:46-08 | 8322 | 2 | 2 | 5 | Socket timeout after 30 seconds
5655 | 2004-11-11 09:47:28-08 | 0 | 0 | 2 | 5 | HTTP ok: HTTP/1.1 200 Channel Listing - 1.150 second response time
5655 | 2004-11-11 09:47:28-08 | | 0 | 0 | 1 | HTTP ok: HTTP/1.1 200 Channel Listing - 1.150 second response time

We start off with an http check in a good state. Then it enters a critical
state (2), and stays in that soft error state for 5 attempts. At that
point, it enters a hard critical state and last_hard_state also gets set
to 2. It's still in a currently having problems too, though, so
current_state also stays at 2. Then, at 9:47, it recovers, but somehow
manages to get 5 checks done in 0 seconds. That's my first point of
confusion. I would have thought that if soft_state was ok (0), then
regardless of last_hard_state, there would be no more attempts and the
service would recover. This might be a bug in nagios, where it's sending
the neb callback the wrong current_attempt number.


134 | 2004-11-10 15:06:59-08 | 51015 | 0 | 0 | 1 | PING OK - Packet loss = 0%, RTA = 65.07 ms
134 | 2004-11-11 05:17:14-08 | 132 | 1 | 0 | 2 | PING WARNING - Packet loss = 0%, RTA = 200.57 ms
134 | 2004-11-11 05:19:26-08 | 3615 | 1 | 1 | 3 | PING WARNING - Packet loss = 0%, RTA = 273.67 ms
134 | 2004-11-11 06:19:41-08 | | 0 | 0 | 1

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked