Page 1 of 2

recovery email being sent while host in recurring downtime

Posted: Wed Aug 08, 2018 2:54 pm
by micdud
Hi,
i noticed this after upgrading to 5.5.x.

Host is in recurring downtime goes down. all good no email notification and then machines comes back up while still in recurring downtime and we are getting recovery email.
how can i make this behavior stop.

Mike

Re: recovery email being sent while host in recurring downti

Posted: Thu Aug 09, 2018 9:17 am
by cdienger
What version are you on now? 5.5.2 resolved some problems with recurring downtime. Upgrade to this version if the machine ins't already there and let us us know if the behavior continues.

Re: recovery email being sent while host in recurring downti

Posted: Thu Aug 09, 2018 9:39 am
by micdud
upgrading to 5.5.2 did fix a lot of recurring downtime issues but not this one.
we are getting recovery email while machine is still in recurring downtime.
to be clear machine goes into recurring downtime (scheduled downtime). after that we reboot machine (no notification about machine going down) so far so good. while machine still in recurring/scheduled downtime after machine comes back online we are getting ping recovery notification which should of been suppressed due to recurring/scheduled downtime.

Re: recovery email being sent while host in recurring downti

Posted: Thu Aug 09, 2018 12:39 pm
by lmiltchev
Can you show us the actual recovery email notification that you received?

Run the following commands and show the output:

Code: Select all

/usr/local/nagios/bin/nagios -V
grep -i '<hostname>' /usr/local/nagios/var/nagios.log | perl -pe 's/(\d+)/localtime($1)/e'
where you substitute <hostname> with the actual hostname of the "problem" host.

Re: recovery email being sent while host in recurring downti

Posted: Fri Aug 10, 2018 2:45 pm
by micdud
im not sure why this host has ping as service instead host check but either way it shouldnt alert.

output:
grep -i 'win_sql_server' /usr/local/nagios/var/nagios.log | perl -pe 's/(\d+)/localtime($1)/e'
[Fri Aug 10 00:00:00 2018] CURRENT HOST STATE: win_sql_server;UP;HARD;1;OK - 10.226.165.51: rta 122.255ms, lost 0%
[Fri Aug 10 00:00:00 2018] CURRENT SERVICE STATE: win_sql_server;CPU Usage 80/90;OK;HARD;1;CPU Load 0% (5 min average)
[Fri Aug 10 00:00:00 2018] CURRENT SERVICE STATE: win_sql_server;Drive C: Disk Usage 80/95;OK;HARD;1;C:\ - total: 39.66 Gb - used: 25.53 Gb (64%) - free 14.12 Gb (36%)
[Fri Aug 10 00:00:00 2018] CURRENT SERVICE STATE: win_sql_server;Drive E: Disk Usage 90/95;OK;HARD;1;E:\ - total: 40.00 Gb - used: 32.26 Gb (81%) - free 7.74 Gb (19%)
[Fri Aug 10 00:00:00 2018] CURRENT SERVICE STATE: win_sql_server;Memory Usage 90/95;OK;HARD;1;Memory usage: total:17262.92 MB - used: 14687.72 MB (85%) - free: 2575.21 M/usr/local/nagios/bin/nagios -VB (15%)
[Fri Aug 10 00:00:00 2018] CURRENT SERVICE STATE: win_sql_server;NSClient Status;OK;HARD;1;OK: All services are in their appropriate state.
[Fri Aug 10 00:00:00 2018] CURRENT SERVICE STATE: win_sql_server;Ping;OK;HARD;1;OK - 10.226.165.51: rta 122.237ms, lost 0%
[Fri Aug 10 00:00:00 2018] CURRENT SERVICE STATE: win_sql_server;SQL Core Services;OK;HARD;1;sqlserveragent: Started - mssqlserver: Started
[Fri Aug 10 00:00:00 2018] CURRENT SERVICE STATE: win_sql_server;Uptime;OK;HARD;1;System Uptime - 272 day(s) 5 hour(s) 15 minute(s)
[Fri Aug 10 13:14:59 2018] HOST DOWNTIME ALERT: win_sql_server;STARTED; Host has entered a period of scheduled downtime
[Fri Aug 10 13:14:59 2018] SERVICE DOWNTIME ALERT: win_sql_server;CPU Usage 80/90;STARTED; Service has entered a period of scheduled downtime
[Fri Aug 10 13:14:59 2018] SERVICE DOWNTIME ALERT: win_sql_server;Drive C: Disk Usage 80/95;STARTED; Service has entered a period of scheduled downtime
[Fri Aug 10 13:14:59 2018] SERVICE DOWNTIME ALERT: win_sql_server;Drive E: Disk Usage 90/95;STARTED; Service has entered a period of scheduled downtime
[Fri Aug 10 13:14:59 2018] SERVICE DOWNTIME ALERT: win_sql_server;Memory Usage 90/95;STARTED; Service has entered a period of scheduled downtime
[Fri Aug 10 13:14:59 2018] SERVICE DOWNTIME ALERT: win_sql_server;NSClient Status;STARTED; Service has entered a period of scheduled downtime
[Fri Aug 10 13:14:59 2018] SERVICE DOWNTIME ALERT: win_sql_server;Ping;STARTED; Service has entered a period of scheduled downtime
[Fri Aug 10 13:15:00 2018] SERVICE DOWNTIME ALERT: win_sql_server;SQL Core Services;STARTED; Service has entered a period of scheduled downtime
[Fri Aug 10 13:15:00 2018] SERVICE DOWNTIME ALERT: win_sql_server;Uptime;STARTED; Service has entered a period of scheduled downtime
[Fri Aug 10 14:25:30 2018] SERVICE ALERT: win_sql_server;Ping;CRITICAL;SOFT;1;CRITICAL - 10.226.165.51: rta 650.163ms, lost 0%
[Fri Aug 10 14:27:30 2018] SERVICE ALERT: win_sql_server;Ping;CRITICAL;HARD;3;CRITICAL - 10.226.165.51: rta 672.770ms, lost 0%
[Fri Aug 10 14:32:24 2018] SERVICE NOTIFICATION: prod_sql;win_sql_server;Ping;OK;notify-service-by-email;OK - 10.226.165.51: rta 122.226ms, lost 0%
[Fri Aug 10 14:32:24 2018] SERVICE ALERT: win_sql_server;Ping;OK;HARD;1;OK - 10.226.165.51: rta 122.226ms, lost 0%


/usr/local/nagios/bin/nagios -V
Nagios Core 4.4.1
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 2018-06-25
License: GPL

Website: https://www.nagios.org
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License version 2 as
published by the Free Software Foundation.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

Re: recovery email being sent while host in recurring downti

Posted: Fri Aug 10, 2018 3:44 pm
by cdienger
I'm currently working on reproducing this error and would appreciate if you could PM me a profile(Admin > System Config > System Profile > Download System Profile).

Re: recovery email being sent while host in recurring downti

Posted: Fri Aug 10, 2018 4:35 pm
by lmiltchev
Initially I thought that we are talking about host notifications, but it seems like that you are having issues with service notifications during scheduled downtime.
[Fri Aug 10 13:14:59 2018] SERVICE DOWNTIME ALERT: win_sql_server;Ping;STARTED; Service has entered a period of scheduled downtime
[Fri Aug 10 14:32:24 2018] SERVICE NOTIFICATION: prod_sql;win_sql_server;Ping;OK;notify-service-by-email;OK - 10.226.165.51: rta 122.226ms, lost 0%
Having said that, we haven't been able to recreate the issue in house. We tested both, the fixed and the flexible scheduled downtime, but no recovery notifications were sent during downtime. Was the Ping in fixed or flexible downtime? It would be nice to know, so that we can do some more digging into this.

Also, to rule this out - can you check to see if you have multiple nagios processes running?

Code: Select all

ps -ef | grep nagios.cfg | grep -v grep
Are recovery notifications during scheduled downtime a "common occurrence" for you or this is a "one time off" thing?

Re: recovery email being sent while host in recurring downti

Posted: Tue Aug 14, 2018 10:33 am
by micdud
I'm sorry i was out yesterday. I'm working on info you requested.

Re: recovery email being sent while host in recurring downti

Posted: Tue Aug 14, 2018 10:34 am
by lmiltchev
Sure, send us the info whenever you are ready.

Re: recovery email being sent while host in recurring downti

Posted: Tue Aug 14, 2018 12:07 pm
by micdud

Code: Select all

ps -ef | grep nagios.cfg | grep -v grep
nagios   24495     1  2 09:21 ?        00:04:10 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   24590 24495  0 09:21 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
this host sends recovery email everyday twice a day. we are having issue with this host losing ping but that is a different issue.