Page 1 of 2

RESOLVED Why notifications not happening

Posted: Thu Aug 15, 2024 10:56 am
by gregbeyer
I am not getting notifications when I should be. Tailing /usr/local/nagiosxi/var/eventman.log I am seeing a curious entry: ERROR: Could not find user_id for contact '$'. Full context, below. I am and explicit contact for this service, as well as being in the admin group, which is also a contact. I do have an email address configured in my user profile. So I don't understand where the "$" is coming from, instead of my accountid and my email address. Notification is on for Critical and Recovery for this service. When I run the report for all notifications, although it's going into alert, this doesn't show up in the notifications report.

(
[notification-type] => service
[contact] => $
[contactemail] => $
[type] =>
[escalated] =>
[author] =>
[comments] =>
[host] => atl1-1-01-002-11
[hostaddress] => 172.27.144.254
[hostalias] => atl1-1-01-002-11
[hostdisplayname] => 172.27.144.254
[service] => nrpe-rsyslogd
[hoststate] => UP
[hoststateid] => 0
[servicestate] => CRITICAL
[servicestateid] => 2
[lastservicestate] => CRITICAL
[lastservicestateid] => 2
[servicestatetype] => SOFT
[currentattempt] => 2
[maxattempts] => 5
[serviceeventid] => 997886
[serviceproblemid] => 491989
[serviceoutput] => PROCS CRITICAL: 0 processes with command name rsyslogd, UID = 0 (root)
[longserviceoutput] =>
[datetime] => Thu Aug 15 11:53:58 EDT 2024
)

[logging_enabled] => 1
)
ERROR: Could not find user_id for contact '$'

Re: Why notifications not happening

Posted: Fri Aug 16, 2024 4:02 pm
by danderson
Thanks for reaching out @gregbeyer,

Navigate to CCM -> Commands -> Commands and lookup either 'xi_host_notification_handler' or 'xi_service_notification_handler'

If you navigate to Home -> Incident Management -> Notifications, if you are able to find the specific notification, what is the value under the contact column?

Re: Why notifications not happening

Posted: Tue Aug 20, 2024 11:51 am
by gregbeyer
In CCM / Commands, You didn't say if I should do anything with them, but I do find both notification handlers.

I just downed a service on a node, and a critical shows up in Home/ Inc. Mgt / Alerts. However in Home/ Inc. Mgt. / Notifications -- nothing. In CCM / Service, the contact is me.

A little more info - if I try to send a custom notification, nothing appears in Notifications (should it?), nothing appears in eventman.log (should it?) and I get not email.

I do get notification for hosts down. But not services.

Re: Why notifications not happening

Posted: Tue Aug 20, 2024 5:27 pm
by danderson
I wanted to verify that those commands existed. Sorry for not being explicit

With the notifications that you get for the hosts down, do they get the odd error you noticed in the eventman.log?

If the notifications aren't showing up in the Home -> Inc. Mgt. -> Notifications, then that means Nagios Core isn't intending to send the notifications and/or there is some error with NDO.

If you navigate to the service detail page for a specific service you are trying to diagnose (either going to Home -> Details -> Service Status and clicking on the specific host or searching the host directly via the search bar) and you go to the Advanced tab. Under the service attributes box, does it say that notifications are enabled?

Re: Why notifications not happening

Posted: Wed Aug 21, 2024 10:53 am
by gregbeyer
No prob, Dan, that's what I figured.

So I just downed a host to create log entries. Part of the entries for the event include my valid email address, and part of the entries include the mystery "ERROR: Could not find user_id for contact '$' " just a few lines down. Extract of log attached. The event shows in Alerts, Notifications and I got an email, as expected, yeah. Puzzling why the error when a few lines before, it does have a contact, but I'll take what I can get, lol. Or maybe this is a symptom of something I really need to fix. :?

"...error in NDO..." NDO?

Yes, notifications are enabled for the service in question.

Results of downing a service are attached.

nrpe-root-disk_gbeyer-test notification is enabled, and I am the contact. It shows critical in console, shows in Latest Alerts. Nothing in Notifications. No email received. eventmanager.log extract is attached.


host-down-notification.txt
service-down-notification.txt

Re: Why notifications not happening

Posted: Wed Aug 21, 2024 11:51 am
by gregbeyer
On a side note, I've been looking at Incident Mgt / "Latest Alerts" a lot as I've put things in and out of alert. I notice that when a host or service recovers, it is removed from Latest Alerts. So it is not a running list. At first I thought I was losing my marbles -- where did that alert go that I know I just saw?? :? Then I realized the behavior. "Latest" isn't true either. If I expand Max Items, I've got alerts going back to May. Minor stuff that is still in alert that I haven't dealt with, so hardly latest.

In conclusion, "Latest Alerts" is a misnomer, should be "Current Alerts". Possibly something for a future release. And a true running list of alerts would be good to have, as well.

Re: Why notifications not happening

Posted: Wed Aug 21, 2024 3:19 pm
by DoubleDoubleA
Hi @gregbeyer,

That's a super interesting distinction on Latest/Current. I made an internal issue #1274 to track it. I don't know that we'll get to it immediately, but it is in the queue.

Aaron

Re: Why notifications not happening

Posted: Wed Aug 21, 2024 4:49 pm
by danderson
I made an internal issue on the latest alerts thing.

NDO is a NEB module that allows Nagios Core to communicate with a database.

On the topic of the weird notifications. I was able to investigate further and I may have a solution. There are two types of outputs in the log file.

There is the output that looks like this:

Code: Select all

*** GLOBAL HANDLER...
Array
(
    [event_id] => XXXXX
    [event_source] => 2
    [event_type] => 1
    [event_time] => 2024-08-21 10:47:19
    [event_meta] => Array
        (
            [handler-type] => host
            [host] => hostname
            [hostaddress] => XXX.XXX.XXX.XXX
            [hoststate] => DOWN
            [hoststateid] => 1
            [lasthoststate] => UP
            [lasthoststateid] => 0
            [hoststatetype] => HARD
            [currentattempt] => 1
            [maxattempts] => 1
            [hosteventid] => XXXXX
            [hostproblemid] => XXXXXX
            [hostoutput] => CRITICAL - Host Unreachable (XXX.XXX.XXX.XXX)
            [longhostoutput] =>
            [hostdowntime] => 0
        )

    [logging_enabled] => 1
)
This is a result of Nagios XI using an event handler to log the event. This happens every time there is a Hard to Soft and a Soft to Hard state transition. You can read up more on event handlers here. This output is the result of a global event handler that XI uses called xi_host_event_handler.

Then there is the other type that looks like this:

Code: Select all

*** GLOBAL HANDLER...
Array
(
    [event_id] => 164093
    [event_source] => 2
    [event_type] => 2
    [event_time] => 2024-08-21 10:47:19
    [event_meta] => Array
        (
            [notification-type] => host
            [contact] => XXXXX
            [contactemail] => XXXXX
            [type] => XXXXX
            [escalated] => XXXXX
            [author] => XXXXX
            [comments] => XXXXX
            [host] => hostname
            [hostaddress] => XXX.XXX.XXX.XXX
            [hostalias] => hostalias
            [hostdisplayname] => XXX.XXX.XXX.XXX
            [hoststate] => DOWN
            [hoststateid] => 1
            [lasthoststate] => UP
            [lasthoststateid] => 0
            [hoststatetype] => HARD
            [currentattempt] => 1
            [maxattempts] => 1
            [hosteventid] => XXXXX
            [hostproblemid] => XXXXX
            [hostoutput] => CRITICAL - Host Unreachable (XXX.XXX.XXX.XXX)
            [longhostoutput] =>
            [datetime] => date
        )

    [logging_enabled] => 1
)
The is the result of the notification command executed by nagios core when it determines that a notification should be sent. You can read up more on how Core handles notifications here. The command that generates output like this is xi_host_notification_handler.

That command looks like this:

Code: Select all

/usr/bin/php /usr/local/nagiosxi/scripts/handle_nagioscore_notification.php --notification-type=host --contact="$CONTACTNAME$" --contactemail="$CONTACTEMAIL$" --type=$NOTIFICATIONTYPE$ --escalated="$NOTIFICATIONISESCALATED$" --author="$NOTIFICATIONAUTHOR$" --comments="$NOTIFICATIONCOMMENT$" --host="$HOSTNAME$" --hostaddress="$HOSTADDRESS$" --hostalias="$HOSTALIAS$" --hostdisplayname="$HOSTDISPLAYNAME$" --hoststate=$HOSTSTATE$ --hoststateid=$HOSTSTATEID$ --lasthoststate=$LASTHOSTSTATE$ --lasthoststateid=$LASTHOSTSTATEID$ --hoststatetype=$HOSTSTATETYPE$ --currentattempt=$HOSTATTEMPT$ --maxattempts=$MAXHOSTATTEMPTS$ --hosteventid=$HOSTEVENTID$ --hostproblemid=$HOSTPROBLEMID$ --hostoutput="$HOSTOUTPUT$" --longhostoutput="$LONGHOSTOUTPUT$" --datetime="$LONGDATETIME$"
When Core executes commands, it substitutes the values defined in between the "$" symbols. $example$. These are called macros. You can find more info on macros here and you can find a list of macros here.

So what seems to be happening is that Nagios Core is not substituting those macros into the command, so when the bash command runs, $CONTACTEMAIL$ is left that way and bash interprets $CONTACTEMAIL as a variable. Since that variable doesn't exist, you're just left with $. You'll notice in the list of macros that the contact macros are not available in event handler commands.

What I noticed is that I was able to replicate the log files you have when I added the xi_host_notification_handler (which is supposed to be a notification command) as an event handler command.

If you navigate to the CCM -> Monitoring -> Hosts and select your host. Then navigate to the "Check Settings" tab. You should see an event handler command that you can specify. Make sure the xi_host_notification_handler is not set as an event handler. If the settings are not there in the specific host, it's probably defined in a template somewhere. If you go back to the CCM hosts page, there should be a "Relationships" icon on the line of the host. That will give you an idea of where to look for the event handler.

Let me know if you need further help.

Re: Why notifications not happening

Posted: Thu Aug 22, 2024 1:21 pm
by gregbeyer
Thanks for the info on why the errors around host event handlers. I've looked at my hosts, and I do, in fact have xi_host_notification_handler in effect for them. And host down notifications are being issued reliably. But following your advice, I removed it, now have no event handler. I still I get host down notifications. So with or without a handler host down notifications are issued.

The problem I'm having is service notifications. Following your analog for host_notification_handler, I removed xi_service_notification_handler from the service in question, made it go critical. Now, nothing in Latest Alerts (which did have before). Still no notification, and no email.

So it seems that service_event_handler is not the source of the problem of lack of service notifications.

Re: Why notifications not happening

Posted: Thu Aug 22, 2024 3:12 pm
by danderson
As for the lack of service notifications, from the log you sent, it doesn't look like a notification command was sent. Especially because it doesn't show up in the notifications tab. I would double check that the contacts are defined for that service, that notifications are enabled for that service, and that the contact for the service has the correct notification handler, in your case being xi_service_notification_handler.

If everything still looks correct I'll have to dig into it more.