Page 2 of 3

Re: Improper alerts/notifications

Posted: Tue May 31, 2016 2:46 pm
by rkennedy
Looking at your host hyper-vhost1, it has the IP 10.x.x.11 assigned, is that the new IP or the old IP? Can you show a screenshot of the notification emails you're referring to as well? Now that we have your objects.cache it'll help to line things up.

Looking at 'carbackup1' I see this notification firing -

Code: Select all

May 31 08:57:17 nagiosxi nagios: HOST NOTIFICATION: nagiosadmin;carbackup1.x.x;DOWN;xi_host_notification_handler;CRITICAL - 10.x.x.66: Host unreachable @ 10.x.x.10. rta nan, lost 100%
But, I don't see a corresponding host in your cache at all. Is this the one you were referring to? I did find 'carbackup14', but it doesn't appear to be the same one.

Re: Improper alerts/notifications

Posted: Wed Jun 01, 2016 7:40 am
by CarlWedu
That is the new IP.

Re: carbackup1, that is correct. It has been completely removed from CCM, applied and it still generates an alert.

Re: Improper alerts/notifications

Posted: Wed Jun 01, 2016 11:43 am
by rkennedy
I wonder if you have multiple nagios processes running, what is the full output of ps -ef on the system?

Re: Improper alerts/notifications

Posted: Wed Jun 01, 2016 1:04 pm
by CarlWedu
ps -ef

Re: Improper alerts/notifications

Posted: Wed Jun 01, 2016 2:15 pm
by rkennedy
It looks like you have multiple nagios processes running, so you may want to kill off one of them and start it up again manually.

Code: Select all

nagios   64556     1  0 May26 ?        00:51:26 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   56637     1  0 Apr22 ?        06:10:42 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
This would explain the 'haunting' of the old alerts. Once you have just the active process running, things should work as expected.

Re: Improper alerts/notifications

Posted: Wed Jun 01, 2016 2:37 pm
by CarlWedu
killed all but one of those processes. made a ccm change and applied. more nagios processes started during that so i killed the other old one as well. now have just the self-started new ones:

[root@nagiosxi ~]# ps -ef | grep /usr/local/nagios/etc/nagios.cfg
root 11864 12259 0 15:55 pts/0 00:00:00 grep /usr/local/nagios/etc/nagios.cfg
nagios 53829 1 0 15:46 ? 00:00:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 53880 53829 0 15:46 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg


still seeing alerts on the operations screen for devices that have been removed from ccm, like the backup1. did notice that the notifications for that are going to a contact that doesnt exist.

***

will check again in the morning to see if alerts have resolved themselves. thanks!

Re: Improper alerts/notifications

Posted: Wed Jun 01, 2016 4:41 pm
by rkennedy
There will be child processes that started (which are ok), but if the PPID isn't equal to the PID for the single Nagios process that's when I'd be worried.

If you're still seeing issues in the morning, please post the output of ps -ef once again.

Re: Improper alerts/notifications

Posted: Thu Jun 02, 2016 11:21 am
by CarlWedu
last notification for "backup1" was 2016-06-01 15:18:12 and operations screen looks correct so far this morning.


what would cause additional instances of the nagios process to be started like that?

Re: Improper alerts/notifications

Posted: Thu Jun 02, 2016 11:47 am
by rkennedy
To be honest, it's hard to say. It could have been from a multitude of different things happening. Most of the time relating to a time when the nagios service would be stopped / started.

I've seen it happen in the past because the old process couldn't be killed for whatever reason, and then a new one spawns.

Re: Improper alerts/notifications

Posted: Fri Jun 03, 2016 8:23 am
by CarlWedu
thank you!!

/resolved