Page 1 of 1

Host in Downtime but receiving host notification

Posted: Fri Aug 21, 2015 2:18 pm
by brdr
Hi,

We have XI 2014R2.7.

Today we have a remote site experience network issue. The network devices are in a host group. I was asked to put the devices in Downtime. To do this I did:

- Details -> Hostgroup Summary -> Find the host group and View Hostgroup Commands -> Schedule downtime for all services in the hostgroup, then once fixed period was set I hit the check bos 'Schedule Downtime for Hosts Too'.

I went int HOME -> Host/Service detail and the comments for these hosts which read:
By Nagios Administrator at 2015-08-21 14:47:44
This host has been scheduled for fixed downtime from 08-21-2015 14:42:59 to 08-21-2015 19:00:00. Notifications for the host will not be sent out during that time period.


Do you know why a hosts recovery notification would be sent out?

Thx

Re: Host in Downtime but receiving host notification

Posted: Mon Aug 24, 2015 12:05 am
by Box293
Was the recovery notification sent after the downtime period ended?

In XI, find the Service and click on it
There are four icons at the top
The second icon is "View Service Notifications"
The third icon is "View Service History"

For both of these icons:
Click the icon
Change the Period to This Week
Click Update
Take a screenshot

Please show us both screenshots.

Re: Host in Downtime but receiving host notification

Posted: Mon Aug 24, 2015 2:23 pm
by brdr
There was no issue with any services, just host and no recovery sent out after downtime period ended.

You can see the lines in bold below that host recovery notification went out during period of scheduled downtime.

[Fri Aug 21 14:43:28 2015] HOST DOWNTIME ALERT: igm01.londen01;STARTED; Host has entered a period of scheduled downtime
[Fri Aug 21 14:47:45 2015] SERVICE DOWNTIME ALERT: igm01.londen01;eth1 Status;STARTED; Service has entered a period of scheduled downtime
[Fri Aug 21 14:47:45 2015] SERVICE DOWNTIME ALERT: igm01.londen01;eth1 Bandwidth;STARTED; Service has entered a period of scheduled downtime
[Fri Aug 21 14:47:45 2015] SERVICE DOWNTIME ALERT: igm01.londen01;_Swap Usage;STARTED; Service has entered a period of scheduled downtime
[Fri Aug 21 14:47:45 2015] SERVICE DOWNTIME ALERT: igm01.londen01;_Memory Usage;STARTED; Service has entered a period of scheduled downtime
[Fri Aug 21 14:47:45 2015] SERVICE DOWNTIME ALERT: igm01.londen01;_Last Rebooted;STARTED; Service has entered a period of scheduled downtime
[Fri Aug 21 14:47:45 2015] SERVICE DOWNTIME ALERT: igm01.londen01;_Disk Volumes Usage;STARTED; Service has entered a period of scheduled downtime
[Fri Aug 21 14:47:45 2015] SERVICE DOWNTIME ALERT: igm01.londen01;_DNS Lookup;STARTED; Service has entered a period of scheduled downtime
[Fri Aug 21 14:47:45 2015] HOST DOWNTIME ALERT: igm01.londen01;STARTED; Host has entered a period of scheduled downtime
[Fri Aug 21 15:00:34 2015] HOST ALERT: igm01.londen01;UP;HARD;1;OK - 10.48.254.40: rta 76.767ms, lost 0%
[Fri Aug 21 15:00:34 2015] HOST NOTIFICATION: it-netops;igm01.londen01;UP;xi_host_notification_handler;OK - 10.48.254.40: rta 76.767ms, lost 0%
[Fri Aug 21 15:00:34 2015] HOST NOTIFICATION: test;igm01.londen01;UP;xi_host_notification_handler;OK - 10.48.254.40: rta 76.767ms, lost 0%

[Fri Aug 21 15:00:44 2015] SERVICE ALERT: igm01.londen01;_Memory Usage;OK;HARD;3;Memory buffers: 9%used(173MB/1939MB) (<80%) : OK
[Fri Aug 21 15:00:54 2015] SERVICE ALERT: igm01.londen01;_DNS Lookup;OK;HARD;3;DNS OK: 0.243 seconds response time. solarwinds.com returns 74.115.13.20
[Fri Aug 21 15:01:03 2015] SERVICE ALERT: igm01.londen01;_Swap Usage;OK;HARD;3;Swap space: 0%used(0MB/2000MB) (<80%) : OK
[Fri Aug 21 15:01:44 2015] SERVICE ALERT: igm01.londen01;eth1 Status;OK;HARD;5;OK: Interface eth1 (index 12) is up.
[Fri Aug 21 15:10:14 2015] SERVICE ALERT: igm01.londen01;_Disk Volumes Usage;OK;HARD;3;/dev/shm: 0%used(0MB/1MB) /storage: 2%used(3665MB/235817MB) /tmpfs: 5%used(0MB/5MB) /reporting: 3%used(33MB/1027MB) /config: 3%used(33MB/1027MB) /: 47%used(1777MB/3743MB) (<98%) : OK
[Fri Aug 21 15:17:14 2015] HOST ALERT: igm01.londen01;UNREACHABLE;SOFT;1;CRITICAL - 10.48.254.40: rta nan, lost 100%
[Fri Aug 21 15:17:44 2015] HOST ALERT: igm01.londen01;UP;SOFT;2;OK - 10.48.254.40: rta 76.687ms, lost 0%
[Fri Aug 21 19:00:00 2015] SERVICE DOWNTIME ALERT: igm01.londen01;_Swap Usage;STOPPED; Service has exited from a period of scheduled downtime
[Fri Aug 21 19:00:00 2015] SERVICE DOWNTIME ALERT: igm01.londen01;eth1 Bandwidth;STOPPED; Service has exited from a period of scheduled downtime
[Fri Aug 21 19:00:00 2015] SERVICE NOTIFICATION: test;igm01.londen01;eth1 Bandwidth;DOWNTIMEEND (OK);xi_service_notification_handler;OK - Current BW in: 0Mbps Out: .01Mbps
[Fri Aug 21 19:00:00 2015] SERVICE DOWNTIME ALERT: igm01.londen01;eth1 Status;STOPPED; Service has exited from a period of scheduled downtime
[Fri Aug 21 19:00:00 2015] SERVICE DOWNTIME ALERT: igm01.londen01;_Last Rebooted;STOPPED; Service has exited from a period of scheduled downtime
[Fri Aug 21 19:00:00 2015] SERVICE NOTIFICATION: test;igm01.londen01;_Last Rebooted;DOWNTIMEEND (OK);xi_service_notification_handler;OK - device is up since 12d 11h 58m 59s
[Fri Aug 21 19:00:00 2015] SERVICE DOWNTIME ALERT: igm01.londen01;_Memory Usage;STOPPED; Service has exited from a period of scheduled downtime
[Fri Aug 21 19:00:00 2015] SERVICE DOWNTIME ALERT: igm01.londen01;_Disk Volumes Usage;STOPPED; Service has exited from a period of scheduled downtime
[Fri Aug 21 19:00:00 2015] SERVICE DOWNTIME ALERT: igm01.londen01;_DNS Lookup;STOPPED; Service has exited from a period of scheduled downtime
[Fri Aug 21 19:00:00 2015] SERVICE NOTIFICATION: test;igm01.londen01;_DNS Lookup;DOWNTIMEEND (OK);xi_service_notification_handler;DNS OK: 0.088 seconds response time. solarwinds.com returns 74.115.13.20
[Fri Aug 21 23:59:59 2015] HOST DOWNTIME ALERT: igm01.londen01;STARTED;per Tom. Network troubleshooting continues
~

Re: Host in Downtime but receiving host notification

Posted: Mon Aug 24, 2015 4:55 pm
by jdalrymple
I can't recreate the problem. Can you on your system?

There is one thing I know - with flexible downtime there was (and continues to be in 4.0.8) a bug where the downtime didn't enact until after the first notification. This doesn't read like that though, and you indicated it was indeed a fixed downtime. I've tried to recreate here, I simply can't do it.

Is it safe to assume that the output we're looking at is `grep igm01.londen01 nagios.log` IN ITS ENTIRETY for that timeperiod run through a script that converts the time to human readable?

Re: Host in Downtime but receiving host notification

Posted: Mon Aug 24, 2015 9:32 pm
by Box293
brdr wrote:There was no issue with any services, just host and no recovery sent out after downtime period ended.

You can see the lines in bold below that host recovery notification went out during period of scheduled downtime.
If I'm understanding this correctly:
  • Downtime started
    Host went down during downtime
    Host came back up during downtime
    Downtime ended
Do you expect a recovery message to be sent AFTER the downtime ended when the host recovered during the downtime period? If that is what you want then Nagios does not work this way.

With your bold lines, did you actually receive these notifications?

Re: Host in Downtime but receiving host notification

Posted: Tue Aug 25, 2015 6:40 am
by brdr
Almost...

Downtime started for all hosts/services in the host group
While in downtime a host recovered (it was down before downtime started)
Host Recovery Notifications were received while in downtime
Downtime ended

I did not expect a recovery message after downtime and did not expect a notification while in downtime.

I can try this again (put a host group (hosts/services) in downtime) this week and see if this is repeatable.

Keep you posted. Thanks.

Re: Host in Downtime but receiving host notification

Posted: Tue Aug 25, 2015 9:03 am
by hsmith
brdr wrote:Almost...

Downtime started for all hosts/services in the host group
While in downtime a host recovered (it was down before downtime started)
Host Recovery Notifications were received while in downtime
Downtime ended

I did not expect a recovery message after downtime and did not expect a notification while in downtime.

I can try this again (put a host group (hosts/services) in downtime) this week and see if this is repeatable.

Keep you posted. Thanks.
Sounds good, just let us know.

Re: Host in Downtime but receiving host notification

Posted: Fri Sep 04, 2015 7:32 am
by brdr
Please close. I think this had to do with timing.. scheduling of the host check and scheduling downtime. If this changes i will open back up.

Thx