Improper alerts/notifications

rkennedy · Post by **rkennedy** » Tue May 31, 2016 2:46 pm

Looking at your host hyper-vhost1, it has the IP 10.x.x.11 assigned, is that the new IP or the old IP? Can you show a screenshot of the notification emails you're referring to as well? Now that we have your objects.cache it'll help to line things up.

Looking at 'carbackup1' I see this notification firing -

Code: Select all

May 31 08:57:17 nagiosxi nagios: HOST NOTIFICATION: nagiosadmin;carbackup1.x.x;DOWN;xi_host_notification_handler;CRITICAL - 10.x.x.66: Host unreachable @ 10.x.x.10. rta nan, lost 100%

But, I don't see a corresponding host in your cache at all. Is this the one you were referring to? I did find 'carbackup14', but it doesn't appear to be the same one.

CarlWedu · Post by **CarlWedu** » Wed Jun 01, 2016 7:40 am

That is the new IP.

Re: carbackup1, that is correct. It has been completely removed from CCM, applied and it still generates an alert.

rkennedy · Post by **rkennedy** » Wed Jun 01, 2016 11:43 am

I wonder if you have multiple nagios processes running, what is the full output of ps -ef on the system?

CarlWedu · Post by **CarlWedu** » Wed Jun 01, 2016 1:04 pm

ps -ef

rkennedy · Post by **rkennedy** » Wed Jun 01, 2016 2:15 pm

It looks like you have multiple nagios processes running, so you may want to kill off one of them and start it up again manually.

Code: Select all

nagios   64556     1  0 May26 ?        00:51:26 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   56637     1  0 Apr22 ?        06:10:42 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

This would explain the 'haunting' of the old alerts. Once you have just the active process running, things should work as expected.

CarlWedu · Post by **CarlWedu** » Wed Jun 01, 2016 2:37 pm

killed all but one of those processes. made a ccm change and applied. more nagios processes started during that so i killed the other old one as well. now have just the self-started new ones:

[root@nagiosxi ~]# ps -ef | grep /usr/local/nagios/etc/nagios.cfg
root 11864 12259 0 15:55 pts/0 00:00:00 grep /usr/local/nagios/etc/nagios.cfg
nagios 53829 1 0 15:46 ? 00:00:03 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 53880 53829 0 15:46 ? 00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

still seeing alerts on the operations screen for devices that have been removed from ccm, like the backup1. did notice that the notifications for that are going to a contact that doesnt exist.

***

will check again in the morning to see if alerts have resolved themselves. thanks!

rkennedy · Post by **rkennedy** » Wed Jun 01, 2016 4:41 pm

There will be child processes that started (which are ok), but if the PPID isn't equal to the PID for the single Nagios process that's when I'd be worried.

If you're still seeing issues in the morning, please post the output of ps -ef once again.

CarlWedu · Post by **CarlWedu** » Thu Jun 02, 2016 11:21 am

last notification for "backup1" was 2016-06-01 15:18:12 and operations screen looks correct so far this morning.

what would cause additional instances of the nagios process to be started like that?

rkennedy · Post by **rkennedy** » Thu Jun 02, 2016 11:47 am

To be honest, it's hard to say. It could have been from a multitude of different things happening. Most of the time relating to a time when the nagios service would be stopped / started.

I've seen it happen in the past because the old process couldn't be killed for whatever reason, and then a new one spawns.

CarlWedu · Post by **CarlWedu** » Fri Jun 03, 2016 8:23 am

thank you!!

/resolved

Nagios Support Forum

Improper alerts/notifications

Re: Improper alerts/notifications

Re: Improper alerts/notifications

Re: Improper alerts/notifications

Re: Improper alerts/notifications

Re: Improper alerts/notifications

Re: Improper alerts/notifications

Re: Improper alerts/notifications

Re: Improper alerts/notifications

Re: Improper alerts/notifications

Re: Improper alerts/notifications