About Service Check Scheduling

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
cornea
Posts: 13
Joined: Thu Sep 20, 2012 1:28 am

Re: About Service Check Scheduling

Post by cornea »

cornea wrote:[Wed Jan 16 14:15:28 2013] HOST ALERT: ASNAY0S0004;DOWN;SOFT;1;(Host Check Timed Out)
[Wed Jan 16 14:16:03 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ram;OK;SOFT;2;Processor:60% : 60% : : OK
[Wed Jan 16 14:16:13 2013] HOST ALERT: ASNAY0S0004;DOWN;SOFT;2;(Host Check Timed Out)
[Wed Jan 16 14:17:20 2013] HOST ALERT: ASNAY0S0004;UP;SOFT;3;OK - 10.196.255.9: rta 16.178ms, lost 0%
[Wed Jan 16 14:24:45 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ping;WARNING;SOFT;1;WARNING - 10.196.255.9: rta 16.093ms, lost 66%
[Wed Jan 16 14:25:46 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ping;OK;SOFT;2;OK - 10.196.255.9: rta 16.134ms, lost 0%
[Wed Jan 16 14:30:46 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ping;CRITICAL;SOFT;1;CRITICAL - 10.196.255.9: rta nan, lost 100%
[Wed Jan 16 14:31:25 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_cpu;UNKNOWN;SOFT;1;ERROR: Description table : No response from remote host "10.196.255.9".
[Wed Jan 16 14:31:45 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ping;OK;SOFT;2;OK - 10.196.255.9: rta 16.348ms, lost 0%
[Wed Jan 16 14:32:20 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_cpu;OK;SOFT;2;CPU : 12 11 11 : OK
[Wed Jan 16 14:56:22 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ram;UNKNOWN;SOFT;1;ERROR: Description table : No response from remote host "10.196.255.9".
[Wed Jan 16 14:56:33 2013] HOST ALERT: ASNAY0S0004;DOWN;SOFT;1;(Host Check Timed Out)
[Wed Jan 16 14:56:52 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ping;CRITICAL;HARD;1;CRITICAL - 10.196.255.9: rta nan, lost 100%
[Wed Jan 16 14:57:17 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ram;OK;SOFT;2;Processor:60% : 60% : : OK
[Wed Jan 16 14:57:32 2013] HOST ALERT: ASNAY0S0004;DOWN;SOFT;2;(Host Check Timed Out)


Please notice the red line. The state is HARD, but it did not send out the notification. Why?
Another question: Is this a "Hard State Change"?
cornea
Posts: 13
Joined: Thu Sep 20, 2012 1:28 am

Re: About Service Check Scheduling

Post by cornea »

[Tue Dec 18 16:24:48 2012] SERVICE ALERT: hub02;Smtp_587_Conns;OK;HARD;3;Number of connections on port 587 : 25
[Tue Dec 18 16:29:38 2012] HOST ALERT: hub02;DOWN;SOFT;1;PING WARNING - rtt min/avg/max/mdev = 10.675/10.711/10.747/0.036 ms
[Tue Dec 18 16:29:48 2012] SERVICE ALERT: hub02;Smtp_587_Conns;CRITICAL;HARD;1;(Service Check Timed Out)
[Tue Dec 18 16:30:13 2012] HOST ALERT: hub02;UP;SOFT;2;PING OK - rtt min/avg/max/mdev = 4.256/7.666/10.557/2.599 ms
[Tue Dec 18 16:33:43 2012] HOST ALERT: hub02;DOWN;SOFT;1;PING WARNING - rtt min/avg/max/mdev = 5.009/6.375/7.741/1.366 ms
[Tue Dec 18 16:33:53 2012] HOST ALERT: hub02;UP;SOFT;2;PING OK - rtt min/avg/max/mdev = 6.710/9.729/13.529/2.839 ms
[Tue Dec 18 16:34:38 2012] HOST ALERT: hub02;DOWN;SOFT;1;PING WARNING - rtt min/avg/max/mdev = 6.180/7.219/8.258/1.039 ms
[Tue Dec 18 16:34:48 2012] HOST ALERT: hub02;UP;SOFT;2;PING OK - rtt min/avg/max/mdev = 4.078/6.425/9.023/2.028 ms
[Tue Dec 18 16:34:48 2012] SERVICE ALERT: hub02;Smtp_587_Conns;CRITICAL;SOFT;1;(Service Check Timed Out)
[Tue Dec 18 16:35:43 2012] HOST ALERT: hub02;DOWN;SOFT;1;PING WARNING - rtt min/avg/max/mdev = 6.022/7.827/9.633/1.807 ms
[Tue Dec 18 16:35:48 2012] HOST ALERT: hub02;UP;SOFT;2;PING OK - rtt min/avg/max/mdev = 5.324/8.108/10.798/2.236 ms
[Tue Dec 18 16:35:48 2012] SERVICE ALERT: hub02;Smtp_587_Conns;CRITICAL;SOFT;2;(Service Check Timed Out)
[Tue Dec 18 16:36:48 2012] SERVICE ALERT: hub02;Smtp_587_Conns;CRITICAL;HARD;3;(Service Check Timed Out)
[Tue Dec 18 16:36:48 2012] SERVICE NOTIFICATION: admin;hub02;Smtp_587_Conns;CRITICAL;service-notify-by-sendEmail;(Service Check Timed Out)
[Tue Dec 18 16:41:50 2012] SERVICE ALERT: hub02;Smtp_587_Conns;CRITICAL;HARD;3;(Service Check Timed Out)

I got some logs from another server.
The state was HARD and status was CRITICAL, but I do not get the notification.
cornea
Posts: 13
Joined: Thu Sep 20, 2012 1:28 am

Re: About Service Check Scheduling

Post by cornea »

I find some lines in the source code "check.c", but I am not familiar with programming.
I confuse the red line. Is there someone can explain why?
Why it put service into hard state but don't send notification?


if(route_result != HOST_UP) {

log_debug_info(DEBUGL_CHECKS, 2, "Host is not UP, so we mark state changes if appropriate\n");

/* "fake" a hard state change for the service - well, its not really fake, but it didn't get caught earlier... */
if(temp_service->last_hard_state != temp_service->current_state)
hard_state_change = TRUE;

/* update last state change times */
if(state_change == TRUE || hard_state_change == TRUE)
temp_service->last_state_change = temp_service->last_check;
if(hard_state_change == TRUE) {
temp_service->last_hard_state_change = temp_service->last_check;
temp_service->state_type = HARD_STATE;
temp_service->last_hard_state = temp_service->current_state;
}

/* put service into a hard state without attempting check retries and don't send out notifications about it */
temp_service->host_problem_at_last_check = TRUE;
/* Below removed 08/04/2010 EG - http://tracker.nagios.org/view.php?id=128 */
/*
temp_service->state_type=HARD_STATE;
temp_service->last_hard_state=temp_service->current_state;
temp_service->current_attempt=1;
*/
}
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: About Service Check Scheduling

Post by slansing »

So going all the way back to the beginning, it looks like you are using passive checks correct? It is possible as any said that the service was flapping which means that it was rapidly changing states, it could be that it hit a Critical Soft state and not a hard state which would trigger a notification, how many times have you seen this happen? Or did it only happen once.
Locked