Ton Voon wrote:
> Hi!
>
> We've been doing some work to validate the data in NDOUtils and found a
> bug in Nagios and a missing state change entry. This happens when a
> service is in a failed state and changes to a different state at the
> same time that the host is considered down (or unreachable).
>
> DETAIL
>
> These are the servicecheck results in the database:
>
> mysql> select
> start_time,state,state_type,output,current_check_attempt,max_check_attempts
> from nagios_servicechecks where service_object_id=445 and start_time
> between '2007-11-05 13:40:00' and '2007-11-05 14:00:00';
> +---------------------+-------+------------+-----------------------------------------------------+-----------------------+--------------------+
>
> | start_time | state | state_type |
> output |
> current_check_attempt | max_check_attempts |
> +---------------------+-------+------------+-----------------------------------------------------+-----------------------+--------------------+
>
> | 2007-11-05 13:41:18 | 1 | 1 | DISK WARNING - free space:
> / 1938 MB (10% inode=-): | 3 | 3 |
> | 2007-11-05 13:46:18 | 1 | 1 | DISK WARNING - free space:
> / 1939 MB (10% inode=-): | 3 | 3 |
> | 2007-11-05 13:51:18 | 2 | 1 | CHECK_NRPE: Socket timeout
> after 10 seconds. | 1 | 3 |
> | 2007-11-05 13:56:18 | 1 | 0 | DISK WARNING - free space:
> / 1939 MB (10% inode=-): | 1 | 3 |
> | 2007-11-05 13:57:18 | 1 | 0 | DISK WARNING - free space:
> / 1939 MB (10% inode=-): | 2 | 3 |
> | 2007-11-05 13:58:39 | 0 | 1 | DISK OK - free space: /
> 2639 MB (14% inode=-): | 1 | 3 |
> +---------------------+-------+------------+-----------------------------------------------------+-----------------------+--------------------+
>
>
> 6 rows in set (0.02 sec)
>
>
> Note that the current_check_attempt is 1/3 for the CRITICAL event at
> 13:51:18. This should be 3/3. A side effect of this is that the
> subsequent warning at 13:56:18 is now considered a soft state when it
> should remain as hard.
>
>
> Looking at the state history table, we get:
>
> mysql> select
> state_time,state,state_type,output,current_check_attempt,max_check_attempts
> from nagios_statehistory where object_id=445 and state_time between
> '2007-11-05 11:50:00' and '2007-11-05 14:00:00';
> +---------------------+-------+------------+-----------------------------------------------------+-----------------------+--------------------+
>
> | state_time | state | state_type |
> output |
> current_check_attempt | max_check_attempts |
> +---------------------+-------+------------+-----------------------------------------------------+-----------------------+--------------------+
>
> | 2007-11-05 11:51:05 | 1 | 1 | DISK WARNING - free space:
> / 1902 MB (10% inode=-): | 3 | 3 |
> | 2007-11-05 13:56:39 | 1 | 0 | DISK WARNING - free space:
> / 1939 MB (10% inode=-): | 1 | 3 |
> | 2007-11-05 13:57:19 | 1 | 0 | DISK WARNING - free space:
> / 1939 MB (10% inode=-): | 2 | 3 |
> | 2007-11-05 13:58:41 | 0 | 1 | DISK OK - free space: /
> 2639 MB (14% inode=-): | 3 | 3 |
> +---------------------+-------+------------+-----------------------------------------------------+-----------------------+--------------------+
>
> 4 rows in set (0.00 sec)
>
>
> Note that the state change from warn to critical at 13:51:18 has been
> missed from here.
>
> These are the relevant lines from nagios.log (the first just to show
> that there were no interesting entries before 13:52:07):
>
> Mon Nov 5 13:50:57 2007 SERVICE ALERT:
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]