[Nagios-devel] Recovery not getting sent during downtime?

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[Nagios-devel] Recovery not getting sent during downtime?

Post by Guest »

Hi folks,

I'm currently using Nagios 2.0b3 (never change a running system ;)) and
ran
into the following problem:

Service went critical
SMS and emails got dispatched
found problem, decided to reboot the machine to fix it
scheduled downtime for host
rebooted host
everything went ok again
no SMS/email got dispatched to state the service recovered though!

I'm unsure if this problem was already fixed, I didn't find any real
evidence in google or the changelogs. Though fixes in the
recovery logics and notifcation system itself were documented,
they weren't too detailed though.

Question: is this a bug or feature? If it is a bug, has it been fixed in
a newer release which I can update to?

It poses a problem to us as admins that are currently offsite don't get
messages that the problem is ok already. So we get quite some unnecessary
phonecalls to check for a problem that is already solved.

Here's an excerpt how it looked like in the nagios log:

[1153954542] SERVICE ALERT: NSEXT01;NOTES;CRITICAL;SOFT;1;Connection
refused
[1153954600] SERVICE ALERT: NSEXT01;NOTES;CRITICAL;SOFT;2;Connection
refused
[1153954660] SERVICE ALERT: NSEXT01;NOTES;CRITICAL;HARD;3;Connection
refused
[1153954660] SERVICE NOTIFICATION:
RGingter;NSEXT01;NOTES;CRITICAL;notify-by-email;Connection refused
[1153954660] SERVICE NOTIFICATION:
MArslan;NSEXT01;NOTES;CRITICAL;notify-by-email;Connection refused
[1153954660] SERVICE NOTIFICATION:
IT_Service;NSEXT01;NOTES;CRITICAL;notify-by-email;Connection refused
[1153955260] SERVICE NOTIFICATION:
RGingter_SMS;NSEXT01;NOTES;CRITICAL;notify-by-sms;Connection refused
[1153955260] SERVICE NOTIFICATION:
MArslan_SMS;NSEXT01;NOTES;CRITICAL;notify-by-sms;Connection refused
...rest of alerts snipped out...
[1153980519] EXTERNAL COMMAND:
SCHEDULE_HOST_DOWNTIME;NSEXT01;1153980509;1153981829;1;0;7200;technik;Neustart
MAr
[1153980519] HOST DOWNTIME ALERT: NSEXT01;STARTED; Host has entered a
period of scheduled downtime
[1153980595] HOST ALERT: NSEXT01;DOWN;SOFT;1;CRITICAL - 10.150.1.2: rta
nan, lost 100%
[1153980605] HOST ALERT: NSEXT01;DOWN;SOFT;2;CRITICAL - 10.150.1.2: rta
nan, lost 100%
[1153980615] HOST ALERT: NSEXT01;DOWN;HARD;3;CRITICAL - 10.150.1.2: rta
nan, lost 100%
[1153980615] SERVICE ALERT: NSEXT01;PING;CRITICAL;HARD;1;CRITICAL -
10.150.1.2: rta nan, lost 100%
[1153980687] SERVICE ALERT: NSEXT01;CPU;CRITICAL;HARD;1;CRITICAL - Socket
timeout after 10 seconds
[1153980687] SERVICE ALERT: NSEXT01;UPTIME;CRITICAL;HARD;1;CRITICAL -
Socket timeout after 10 seconds
[1153980687] SERVICE ALERT: NSEXT01;DISK_C;CRITICAL;HARD;1;CRITICAL -
Socket timeout after 10 seconds
[1153980707] HOST ALERT: NSEXT01;UP;HARD;1;OK - 10.150.1.2: rta 1.382ms,
lost 0%
[1153980707] SERVICE ALERT: NSEXT01;PING;OK;HARD;1;OK - 10.150.1.2: rta
3.307ms, lost 0%
[1153980767] SERVICE ALERT: NSEXT01;NOTES;CRITICAL;SOFT;1;Connection
refused
[1153980805] SERVICE ALERT: NSEXT01;MEMUSE;CRITICAL;SOFT;1;Connection
refused
[1153980805] SERVICE ALERT: NSEXT01;DISK_D;CRITICAL;SOFT;1;Connection
refused
[1153980805] SERVICE ALERT: NSEXT01;DISK_E;CRITICAL;SOFT;1;Connection
refused
[1153980828] SERVICE ALERT: NSEXT01;NOTES;OK;SOFT;2;TCP OK - 0.070 second
response time on port 1352
[1153980976] SERVICE ALERT: NSEXT01;CPU;OK;HARD;1;CPU Load 37% (10 min
average)
[1153980976] SERVICE ALERT: NSEXT01;UPTIME;OK;HARD;1;System Uptime - 0
day(s) 0 hour(s) 5 minute(s)
[1153980976] SERVICE ALERT: NSEXT01;DISK_C;OK;HARD;1;C:\ - total: 3.00 Gb
- used: 2.05 Gb (68%) - free 0.95 Gb (32%)
[1153981105] SERVICE ALERT: NSEXT01;MEMUSE;OK;SOFT;2;Memory usage:
total:1951.26 Mb - used: 434.44 Mb (22%) - free: 1516.82 Mb (78%)
[1153981105] SERVICE ALERT: NSEXT01;DISK_D;OK;SOFT;2;D:\ - total: 5.43 Gb
- used: 2.46 Gb (45%) - free 2.97 Gb (55%)
[1153981105] SERVICE ALERT: NSEXT01;DISK_E;OK;SOFT;2;E:\ - total: 67.83 Gb
- used: 14.92 Gb (22%) - free 52.91 Gb (78%)
[1153981832] HOST DOWNTIME ALERT: NSEXT01;STOPPED; Host has exited from a
period of scheduled downtime

Any insight in this would be appreciated.

sincerely
Sascha

--
Sascha Runschke
Netzwerk Management
IT-Services

ABIT AG
Robert-Bos

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked