Another question: Is this a "Hard State Change"?cornea wrote:[Wed Jan 16 14:15:28 2013] HOST ALERT: ASNAY0S0004;DOWN;SOFT;1;(Host Check Timed Out)
[Wed Jan 16 14:16:03 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ram;OK;SOFT;2;Processor:60% : 60% : : OK
[Wed Jan 16 14:16:13 2013] HOST ALERT: ASNAY0S0004;DOWN;SOFT;2;(Host Check Timed Out)
[Wed Jan 16 14:17:20 2013] HOST ALERT: ASNAY0S0004;UP;SOFT;3;OK - 10.196.255.9: rta 16.178ms, lost 0%
[Wed Jan 16 14:24:45 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ping;WARNING;SOFT;1;WARNING - 10.196.255.9: rta 16.093ms, lost 66%
[Wed Jan 16 14:25:46 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ping;OK;SOFT;2;OK - 10.196.255.9: rta 16.134ms, lost 0%
[Wed Jan 16 14:30:46 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ping;CRITICAL;SOFT;1;CRITICAL - 10.196.255.9: rta nan, lost 100%
[Wed Jan 16 14:31:25 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_cpu;UNKNOWN;SOFT;1;ERROR: Description table : No response from remote host "10.196.255.9".
[Wed Jan 16 14:31:45 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ping;OK;SOFT;2;OK - 10.196.255.9: rta 16.348ms, lost 0%
[Wed Jan 16 14:32:20 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_cpu;OK;SOFT;2;CPU : 12 11 11 : OK
[Wed Jan 16 14:56:22 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ram;UNKNOWN;SOFT;1;ERROR: Description table : No response from remote host "10.196.255.9".
[Wed Jan 16 14:56:33 2013] HOST ALERT: ASNAY0S0004;DOWN;SOFT;1;(Host Check Timed Out)
[Wed Jan 16 14:56:52 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ping;CRITICAL;HARD;1;CRITICAL - 10.196.255.9: rta nan, lost 100%
[Wed Jan 16 14:57:17 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ram;OK;SOFT;2;Processor:60% : 60% : : OK
[Wed Jan 16 14:57:32 2013] HOST ALERT: ASNAY0S0004;DOWN;SOFT;2;(Host Check Timed Out)
Please notice the red line. The state is HARD, but it did not send out the notification. Why?
About Service Check Scheduling
Re: About Service Check Scheduling
Re: About Service Check Scheduling
[Tue Dec 18 16:24:48 2012] SERVICE ALERT: hub02;Smtp_587_Conns;OK;HARD;3;Number of connections on port 587 : 25
[Tue Dec 18 16:29:38 2012] HOST ALERT: hub02;DOWN;SOFT;1;PING WARNING - rtt min/avg/max/mdev = 10.675/10.711/10.747/0.036 ms
[Tue Dec 18 16:29:48 2012] SERVICE ALERT: hub02;Smtp_587_Conns;CRITICAL;HARD;1;(Service Check Timed Out)
[Tue Dec 18 16:30:13 2012] HOST ALERT: hub02;UP;SOFT;2;PING OK - rtt min/avg/max/mdev = 4.256/7.666/10.557/2.599 ms
[Tue Dec 18 16:33:43 2012] HOST ALERT: hub02;DOWN;SOFT;1;PING WARNING - rtt min/avg/max/mdev = 5.009/6.375/7.741/1.366 ms
[Tue Dec 18 16:33:53 2012] HOST ALERT: hub02;UP;SOFT;2;PING OK - rtt min/avg/max/mdev = 6.710/9.729/13.529/2.839 ms
[Tue Dec 18 16:34:38 2012] HOST ALERT: hub02;DOWN;SOFT;1;PING WARNING - rtt min/avg/max/mdev = 6.180/7.219/8.258/1.039 ms
[Tue Dec 18 16:34:48 2012] HOST ALERT: hub02;UP;SOFT;2;PING OK - rtt min/avg/max/mdev = 4.078/6.425/9.023/2.028 ms
[Tue Dec 18 16:34:48 2012] SERVICE ALERT: hub02;Smtp_587_Conns;CRITICAL;SOFT;1;(Service Check Timed Out)
[Tue Dec 18 16:35:43 2012] HOST ALERT: hub02;DOWN;SOFT;1;PING WARNING - rtt min/avg/max/mdev = 6.022/7.827/9.633/1.807 ms
[Tue Dec 18 16:35:48 2012] HOST ALERT: hub02;UP;SOFT;2;PING OK - rtt min/avg/max/mdev = 5.324/8.108/10.798/2.236 ms
[Tue Dec 18 16:35:48 2012] SERVICE ALERT: hub02;Smtp_587_Conns;CRITICAL;SOFT;2;(Service Check Timed Out)
[Tue Dec 18 16:36:48 2012] SERVICE ALERT: hub02;Smtp_587_Conns;CRITICAL;HARD;3;(Service Check Timed Out)
[Tue Dec 18 16:36:48 2012] SERVICE NOTIFICATION: admin;hub02;Smtp_587_Conns;CRITICAL;service-notify-by-sendEmail;(Service Check Timed Out)
[Tue Dec 18 16:41:50 2012] SERVICE ALERT: hub02;Smtp_587_Conns;CRITICAL;HARD;3;(Service Check Timed Out)
I got some logs from another server.
The state was HARD and status was CRITICAL, but I do not get the notification.
[Tue Dec 18 16:29:38 2012] HOST ALERT: hub02;DOWN;SOFT;1;PING WARNING - rtt min/avg/max/mdev = 10.675/10.711/10.747/0.036 ms
[Tue Dec 18 16:29:48 2012] SERVICE ALERT: hub02;Smtp_587_Conns;CRITICAL;HARD;1;(Service Check Timed Out)
[Tue Dec 18 16:30:13 2012] HOST ALERT: hub02;UP;SOFT;2;PING OK - rtt min/avg/max/mdev = 4.256/7.666/10.557/2.599 ms
[Tue Dec 18 16:33:43 2012] HOST ALERT: hub02;DOWN;SOFT;1;PING WARNING - rtt min/avg/max/mdev = 5.009/6.375/7.741/1.366 ms
[Tue Dec 18 16:33:53 2012] HOST ALERT: hub02;UP;SOFT;2;PING OK - rtt min/avg/max/mdev = 6.710/9.729/13.529/2.839 ms
[Tue Dec 18 16:34:38 2012] HOST ALERT: hub02;DOWN;SOFT;1;PING WARNING - rtt min/avg/max/mdev = 6.180/7.219/8.258/1.039 ms
[Tue Dec 18 16:34:48 2012] HOST ALERT: hub02;UP;SOFT;2;PING OK - rtt min/avg/max/mdev = 4.078/6.425/9.023/2.028 ms
[Tue Dec 18 16:34:48 2012] SERVICE ALERT: hub02;Smtp_587_Conns;CRITICAL;SOFT;1;(Service Check Timed Out)
[Tue Dec 18 16:35:43 2012] HOST ALERT: hub02;DOWN;SOFT;1;PING WARNING - rtt min/avg/max/mdev = 6.022/7.827/9.633/1.807 ms
[Tue Dec 18 16:35:48 2012] HOST ALERT: hub02;UP;SOFT;2;PING OK - rtt min/avg/max/mdev = 5.324/8.108/10.798/2.236 ms
[Tue Dec 18 16:35:48 2012] SERVICE ALERT: hub02;Smtp_587_Conns;CRITICAL;SOFT;2;(Service Check Timed Out)
[Tue Dec 18 16:36:48 2012] SERVICE ALERT: hub02;Smtp_587_Conns;CRITICAL;HARD;3;(Service Check Timed Out)
[Tue Dec 18 16:36:48 2012] SERVICE NOTIFICATION: admin;hub02;Smtp_587_Conns;CRITICAL;service-notify-by-sendEmail;(Service Check Timed Out)
[Tue Dec 18 16:41:50 2012] SERVICE ALERT: hub02;Smtp_587_Conns;CRITICAL;HARD;3;(Service Check Timed Out)
I got some logs from another server.
The state was HARD and status was CRITICAL, but I do not get the notification.
Re: About Service Check Scheduling
I find some lines in the source code "check.c", but I am not familiar with programming.
I confuse the red line. Is there someone can explain why?
Why it put service into hard state but don't send notification?
if(route_result != HOST_UP) {
log_debug_info(DEBUGL_CHECKS, 2, "Host is not UP, so we mark state changes if appropriate\n");
/* "fake" a hard state change for the service - well, its not really fake, but it didn't get caught earlier... */
if(temp_service->last_hard_state != temp_service->current_state)
hard_state_change = TRUE;
/* update last state change times */
if(state_change == TRUE || hard_state_change == TRUE)
temp_service->last_state_change = temp_service->last_check;
if(hard_state_change == TRUE) {
temp_service->last_hard_state_change = temp_service->last_check;
temp_service->state_type = HARD_STATE;
temp_service->last_hard_state = temp_service->current_state;
}
/* put service into a hard state without attempting check retries and don't send out notifications about it */
temp_service->host_problem_at_last_check = TRUE;
/* Below removed 08/04/2010 EG - http://tracker.nagios.org/view.php?id=128 */
/*
temp_service->state_type=HARD_STATE;
temp_service->last_hard_state=temp_service->current_state;
temp_service->current_attempt=1;
*/
}
I confuse the red line. Is there someone can explain why?
Why it put service into hard state but don't send notification?
if(route_result != HOST_UP) {
log_debug_info(DEBUGL_CHECKS, 2, "Host is not UP, so we mark state changes if appropriate\n");
/* "fake" a hard state change for the service - well, its not really fake, but it didn't get caught earlier... */
if(temp_service->last_hard_state != temp_service->current_state)
hard_state_change = TRUE;
/* update last state change times */
if(state_change == TRUE || hard_state_change == TRUE)
temp_service->last_state_change = temp_service->last_check;
if(hard_state_change == TRUE) {
temp_service->last_hard_state_change = temp_service->last_check;
temp_service->state_type = HARD_STATE;
temp_service->last_hard_state = temp_service->current_state;
}
/* put service into a hard state without attempting check retries and don't send out notifications about it */
temp_service->host_problem_at_last_check = TRUE;
/* Below removed 08/04/2010 EG - http://tracker.nagios.org/view.php?id=128 */
/*
temp_service->state_type=HARD_STATE;
temp_service->last_hard_state=temp_service->current_state;
temp_service->current_attempt=1;
*/
}
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: About Service Check Scheduling
So going all the way back to the beginning, it looks like you are using passive checks correct? It is possible as any said that the service was flapping which means that it was rapidly changing states, it could be that it hit a Critical Soft state and not a hard state which would trigger a notification, how many times have you seen this happen? Or did it only happen once.