[Nagios-devel] Some hard state changes missing in NDOUtils
-
Guest
[Nagios-devel] Some hard state changes missing in NDOUtils
--Apple-Mail-8--242781530
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
delsp=yes;
format=flowed
Hi!
We've been doing some work to validate the data in NDOUtils and found
a bug in Nagios and a missing state change entry. This happens when a
service is in a failed state and changes to a different state at the
same time that the host is considered down (or unreachable).
DETAIL
These are the servicecheck results in the database:
mysql> select
start_time,state,state_type,output,current_check_attempt,max_check_attem
pts from nagios_servicechecks where service_object_id=445 and
start_time between '2007-11-05 13:40:00' and '2007-11-05 14:00:00';
+---------------------+-------+------------
+-----------------------------------------------------
+-----------------------+--------------------+
| start_time | state | state_type |
output |
current_check_attempt | max_check_attempts |
+---------------------+-------+------------
+-----------------------------------------------------
+-----------------------+--------------------+
| 2007-11-05 13:41:18 | 1 | 1 | DISK WARNING - free
space: / 1938 MB (10% inode=-): | 3
| 3 |
| 2007-11-05 13:46:18 | 1 | 1 | DISK WARNING - free
space: / 1939 MB (10% inode=-): | 3
| 3 |
| 2007-11-05 13:51:18 | 2 | 1 | CHECK_NRPE: Socket
timeout after 10 seconds. | 1
| 3 |
| 2007-11-05 13:56:18 | 1 | 0 | DISK WARNING - free
space: / 1939 MB (10% inode=-): | 1
| 3 |
| 2007-11-05 13:57:18 | 1 | 0 | DISK WARNING - free
space: / 1939 MB (10% inode=-): | 2
| 3 |
| 2007-11-05 13:58:39 | 0 | 1 | DISK OK - free space: /
2639 MB (14% inode=-): | 1
| 3 |
+---------------------+-------+------------
+-----------------------------------------------------
+-----------------------+--------------------+
6 rows in set (0.02 sec)
Note that the current_check_attempt is 1/3 for the CRITICAL event at
13:51:18. This should be 3/3. A side effect of this is that the
subsequent warning at 13:56:18 is now considered a soft state when it
should remain as hard.
Looking at the state history table, we get:
mysql> select
state_time,state,state_type,output,current_check_attempt,max_check_attem
pts from nagios_statehistory where object_id=445 and state_time
between '2007-11-05 11:50:00' and '2007-11-05 14:00:00';
+---------------------+-------+------------
+-----------------------------------------------------
+-----------------------+--------------------+
| state_time | state | state_type |
output |
current_check_attempt | max_check_attempts |
+---------------------+-------+------------
+-----------------------------------------------------
+-----------------------+--------------------+
| 2007-11-05 11:51:05 | 1 | 1 | DISK WARNING - free
space: / 1902 MB (10% inode=-): | 3
| 3 |
| 2007-11-05 13:56:39 | 1 | 0 | DISK WARNING - free
space: / 1939 MB (10% inode=-): | 1
| 3 |
| 2007-11-05 13:57:19 | 1 | 0 | DISK WARNING - free
space: / 1939 MB (10% inode=-): | 2
| 3 |
| 2007-11-05 13:58:41 | 0 | 1 | DISK OK - free space: /
2639 MB (14% inode=-): | 3
| 3 |
+---------------------+-------+------------
+-----------------------------------------------------
+-----------------------+--------------------+
4 rows in set (0.00 sec)
Note that the state change from warn to critical at 13:51:18 has been
missed from here.
These are the relevant lines from nagios.log (the first just to show
that there were no interesting entries before 13:52:07):
Mon Nov 5 13:50:57 2007 SERVICE ALERT: unrelatedhost;TCP
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]