[Nagios-devel] Some hard state changes missing in NDOUtils

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[Nagios-devel] Some hard state changes missing in NDOUtils

Post by Guest »


--Apple-Mail-8--242781530
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
delsp=yes;
format=flowed

Hi!

We've been doing some work to validate the data in NDOUtils and found
a bug in Nagios and a missing state change entry. This happens when a
service is in a failed state and changes to a different state at the
same time that the host is considered down (or unreachable).

DETAIL

These are the servicecheck results in the database:

mysql> select
start_time,state,state_type,output,current_check_attempt,max_check_attem
pts from nagios_servicechecks where service_object_id=445 and
start_time between '2007-11-05 13:40:00' and '2007-11-05 14:00:00';
+---------------------+-------+------------
+-----------------------------------------------------
+-----------------------+--------------------+
| start_time | state | state_type |
output |
current_check_attempt | max_check_attempts |
+---------------------+-------+------------
+-----------------------------------------------------
+-----------------------+--------------------+
| 2007-11-05 13:41:18 | 1 | 1 | DISK WARNING - free
space: / 1938 MB (10% inode=-): | 3
| 3 |
| 2007-11-05 13:46:18 | 1 | 1 | DISK WARNING - free
space: / 1939 MB (10% inode=-): | 3
| 3 |
| 2007-11-05 13:51:18 | 2 | 1 | CHECK_NRPE: Socket
timeout after 10 seconds. | 1
| 3 |
| 2007-11-05 13:56:18 | 1 | 0 | DISK WARNING - free
space: / 1939 MB (10% inode=-): | 1
| 3 |
| 2007-11-05 13:57:18 | 1 | 0 | DISK WARNING - free
space: / 1939 MB (10% inode=-): | 2
| 3 |
| 2007-11-05 13:58:39 | 0 | 1 | DISK OK - free space: /
2639 MB (14% inode=-): | 1
| 3 |
+---------------------+-------+------------
+-----------------------------------------------------
+-----------------------+--------------------+

6 rows in set (0.02 sec)


Note that the current_check_attempt is 1/3 for the CRITICAL event at
13:51:18. This should be 3/3. A side effect of this is that the
subsequent warning at 13:56:18 is now considered a soft state when it
should remain as hard.


Looking at the state history table, we get:

mysql> select
state_time,state,state_type,output,current_check_attempt,max_check_attem
pts from nagios_statehistory where object_id=445 and state_time
between '2007-11-05 11:50:00' and '2007-11-05 14:00:00';
+---------------------+-------+------------
+-----------------------------------------------------
+-----------------------+--------------------+
| state_time | state | state_type |
output |
current_check_attempt | max_check_attempts |
+---------------------+-------+------------
+-----------------------------------------------------
+-----------------------+--------------------+
| 2007-11-05 11:51:05 | 1 | 1 | DISK WARNING - free
space: / 1902 MB (10% inode=-): | 3
| 3 |
| 2007-11-05 13:56:39 | 1 | 0 | DISK WARNING - free
space: / 1939 MB (10% inode=-): | 1
| 3 |
| 2007-11-05 13:57:19 | 1 | 0 | DISK WARNING - free
space: / 1939 MB (10% inode=-): | 2
| 3 |
| 2007-11-05 13:58:41 | 0 | 1 | DISK OK - free space: /
2639 MB (14% inode=-): | 3
| 3 |
+---------------------+-------+------------
+-----------------------------------------------------
+-----------------------+--------------------+
4 rows in set (0.00 sec)


Note that the state change from warn to critical at 13:51:18 has been
missed from here.

These are the relevant lines from nagios.log (the first just to show
that there were no interesting entries before 13:52:07):

Mon Nov 5 13:50:57 2007 SERVICE ALERT: unrelatedhost;TCP

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked