About Service Check Scheduling

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
cornea
Posts: 13
Joined: Thu Sep 20, 2012 1:28 am

About Service Check Scheduling

Post by cornea »

Retry Check Interval is 1m. But after 18m, it took second check, why?
Is there anyone here got this problem?
Attachments
无标题.png
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: About Service Check Scheduling

Post by abrist »

The "alert summary" report will only show the history of alerts and state changes. So this particular service took 18 minutes to move from a soft critical to a soft OK. Was the service check flapping?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
cornea
Posts: 13
Joined: Thu Sep 20, 2012 1:28 am

Re: About Service Check Scheduling

Post by cornea »

abrist wrote:The "alert summary" report will only show the history of alerts and state changes. So this particular service took 18 minutes to move from a soft critical to a soft OK. Was the service check flapping?
I didn't find any notification about flapping.
cornea
Posts: 13
Joined: Thu Sep 20, 2012 1:28 am

Re: About Service Check Scheduling

Post by cornea »

cornea wrote:Retry Check Interval is 1m. But after 18m, it took second check, why?
Is there anyone here got this problem?
further infomations. About after 4 hours, it took second check.
Attachments
image001.jpg
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: About Service Check Scheduling

Post by abrist »

Post the output of:

Code: Select all

cat /usr/local/nagios/var/nagios.log | grep [hostname of box] 
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
cornea
Posts: 13
Joined: Thu Sep 20, 2012 1:28 am

Re: About Service Check Scheduling

Post by cornea »

abrist wrote:Post the output of:

Code: Select all

cat /usr/local/nagios/var/nagios.log | grep [hostname of box] 
I checked the log. It is as same as the graph.

But I found some "nagios" process' starttime is before I restart them. Is this possible that some process used the old configuration and the other used the new configuration?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: About Service Check Scheduling

Post by slansing »

Can you post the output that abrist suggested so that we can see all of the most recent logged information?
cornea
Posts: 13
Joined: Thu Sep 20, 2012 1:28 am

Re: About Service Check Scheduling

Post by cornea »

[Wed Jan 16 14:15:28 2013] HOST ALERT: ASNAY0S0004;DOWN;SOFT;1;(Host Check Timed Out)
[Wed Jan 16 14:16:03 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ram;OK;SOFT;2;Processor:60% : 60% : : OK
[Wed Jan 16 14:16:13 2013] HOST ALERT: ASNAY0S0004;DOWN;SOFT;2;(Host Check Timed Out)
[Wed Jan 16 14:17:20 2013] HOST ALERT: ASNAY0S0004;UP;SOFT;3;OK - 10.196.255.9: rta 16.178ms, lost 0%
[Wed Jan 16 14:24:45 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ping;WARNING;SOFT;1;WARNING - 10.196.255.9: rta 16.093ms, lost 66%
[Wed Jan 16 14:25:46 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ping;OK;SOFT;2;OK - 10.196.255.9: rta 16.134ms, lost 0%
[Wed Jan 16 14:30:46 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ping;CRITICAL;SOFT;1;CRITICAL - 10.196.255.9: rta nan, lost 100%
[Wed Jan 16 14:31:25 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_cpu;UNKNOWN;SOFT;1;ERROR: Description table : No response from remote host "10.196.255.9".
[Wed Jan 16 14:31:45 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ping;OK;SOFT;2;OK - 10.196.255.9: rta 16.348ms, lost 0%
[Wed Jan 16 14:32:20 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_cpu;OK;SOFT;2;CPU : 12 11 11 : OK
[Wed Jan 16 14:56:22 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ram;UNKNOWN;SOFT;1;ERROR: Description table : No response from remote host "10.196.255.9".
[Wed Jan 16 14:56:33 2013] HOST ALERT: ASNAY0S0004;DOWN;SOFT;1;(Host Check Timed Out)
[Wed Jan 16 14:56:52 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ping;CRITICAL;HARD;1;CRITICAL - 10.196.255.9: rta nan, lost 100%
[Wed Jan 16 14:57:17 2013] SERVICE ALERT: ASNAY0S0004;s_switch_cisco_ram;OK;SOFT;2;Processor:60% : 60% : : OK
[Wed Jan 16 14:57:32 2013] HOST ALERT: ASNAY0S0004;DOWN;SOFT;2;(Host Check Timed Out)


Please notice the red line. The state is HARD, but it did not send out the notification. Why?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: About Service Check Scheduling

Post by abrist »

What notification options have you set on the host? Could you post the contents the respective host's cfg? Are you receiving email notifications for other hosts/services?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
cornea
Posts: 13
Joined: Thu Sep 20, 2012 1:28 am

Re: About Service Check Scheduling

Post by cornea »

abrist wrote:What notification options have you set on the host? Could you post the contents the respective host's cfg? Are you receiving email notifications for other hosts/services?
Yes. When the host is *real* DOWN, I can receive notifications.
I define a service template for this check, and lots of services use the template. Most time it works well, but sometimes it looks unnormal.
Last edited by cornea on Wed Jan 16, 2013 8:01 pm, edited 1 time in total.
Locked