Page 1 of 2

Unreliable notifications--3rd post for the same issue

Posted: Fri Dec 04, 2015 9:18 pm
by gormank
Yet again I've found that doing nothing more than going to admin, notification management, selecting a user, updating data and clicking update settings typically stops notifications. Randomly updating makes it work again.
This may seem trivial, but to my management it isn't.
I'd like to get a description of what scripts/executables are called and so on when a notification is triggered, all the way until email is generated and I can see it in maillog.

Re: Unreliable notifications--3rd post for the same issue

Posted: Mon Dec 07, 2015 2:08 pm
by tmcdonald
Assuming you are using the standard xi_[host/service]_notification_handler command for the given contact, here is what happens:

1.) A check is run against a host/service and this check is the Nth time it has returned non-OK, where N is max_check_attempts
2.) The command specified for that contact is looked up, in this case the command is xi_[host/service]_notification_handler - By default, here are the command configurations:

xi_host_notification_handler

Code: Select all

/usr/bin/php /usr/local/nagiosxi/scripts/handle_nagioscore_notification.php --notification-type=host --contact="$CONTACTNAME$" --contactemail="$CONTACTEMAIL$" --type=$NOTIFICATIONTYPE$ --escalated="$NOTIFICATIONISESCALATED$" --author="$NOTIFICATIONAUTHOR$" --comments="$NOTIFICATIONCOMMENT$" --host="$HOSTNAME$" --hostaddress="$HOSTADDRESS$" --hostalias="$HOSTALIAS$" --hostdisplayname="$HOSTDISPLAYNAME$" --hoststate=$HOSTSTATE$ --hoststateid=$HOSTSTATEID$ --lasthoststate=$LASTHOSTSTATE$ --lasthoststateid=$LASTHOSTSTATEID$ --hoststatetype=$HOSTSTATETYPE$ --currentattempt=$HOSTATTEMPT$ --maxattempts=$MAXHOSTATTEMPTS$ --hosteventid=$HOSTEVENTID$ --hostproblemid=$HOSTPROBLEMID$ --hostoutput="$HOSTOUTPUT$" --longhostoutput="$LONGHOSTOUTPUT$" --datetime="$LONGDATETIME$"
xi_service_notification_handler

Code: Select all

/usr/bin/php /usr/local/nagiosxi/scripts/handle_nagioscore_notification.php --notification-type=service --contact="$CONTACTNAME$" --contactemail="$CONTACTEMAIL$" --type=$NOTIFICATIONTYPE$ --escalated="$NOTIFICATIONISESCALATED$" --author="$NOTIFICATIONAUTHOR$" --comments="$NOTIFICATIONCOMMENT$" --host="$HOSTNAME$" --hostaddress="$HOSTADDRESS$" --hostalias="$HOSTALIAS$" --hostdisplayname="$HOSTDISPLAYNAME$" --service="$SERVICEDESC$" --hoststate=$HOSTSTATE$ --hoststateid=$HOSTSTATEID$ --servicestate=$SERVICESTATE$ --servicestateid=$SERVICESTATEID$ --lastservicestate=$LASTSERVICESTATE$ --lastservicestateid=$LASTSERVICESTATEID$ --servicestatetype=$SERVICESTATETYPE$ --currentattempt=$SERVICEATTEMPT$ --maxattempts=$MAXSERVICEATTEMPTS$ --serviceeventid=$SERVICEEVENTID$ --serviceproblemid=$SERVICEPROBLEMID$ --serviceoutput="$SERVICEOUTPUT$" --longserviceoutput="$LONGSERVICEOUTPUT$" --datetime="$LONGDATETIME$"
3.) The necessary command is run. In the case of the standard xi_[host/service]_notification_handler commands, the PHP script will be handed a list of Nagios macros that will be replaced by the nagios binary with their appropriate values. $HOSTADDRESS$ will be replaced with the address of the host, $LONGDATETIME$ will have the date, etc.

4.) The /usr/local/nagiosxi/scripts/handle_nagioscore_notification.php script will add the notification to the event queue which is processed by (I believe) the /usr/local/nagiosxi/cron/eventman.php cron job script.
5.) At this point I would need to get developer clarification as to what happens, as some of it is somewhat proprietary. We use PHPMailer to handle our User-based notifications, I know that much.

It is entirely possible that, since Users do not use sendmail, your notification may not end up in the maillog, so looking there for evidence of notifications might not be accurate. Typically I look under Home -> Notifications for a first indications of whether or not notification commands (xi_[host/service]_notification_handler) are even being triggered in the first place.

Notification macros: https://assets.nagios.com/downloads/nag ... iables.pdf
Users and contacts: https://assets.nagios.com/downloads/nag ... ntacts.pdf
Configuring contacts to use User notifications: https://assets.nagios.com/downloads/nag ... Mailer.pdf

Re: Unreliable notifications--3rd post for the same issue

Posted: Mon Dec 07, 2015 2:22 pm
by ssax
Just so that we are on the same page, are you:

Going to Admin > Notification Management, Select User/Changing Settings, and then clicking the Deploy Preferences button?

OR

Are you going into the user's notification settings page, making changes, and clicking the "Update Settings" button?


If you are doing the first one and not loading a template, the default settings have notifications enabled unchecked so it would deploy those settings.

I would like to take a look at your DB values when this is occurring for a user, run this command and then PM me your /tmp/user_output.txt file:
*** Note: Make sure to change yourusername to the username that is experiencing the issue.

Code: Select all

echo "select * from xi_usermeta left join xi_users on xi_users.user_id = xi_usermeta.user_id where xi_users.username = 'yourusername';" | psql nagiosxi nagiosxi > /tmp/user_output.txt

Re: Unreliable notifications--3rd post for the same issue

Posted: Mon Dec 07, 2015 3:09 pm
by gormank
I'm very carefully loading a template in admin, notification mgmt, making a small change and clucking update. For some reason it almost always stops working and has to be tested and tweaked, and saved multiple times until it starts working again.
I have group mail addresses so users aren't changing settings.

Re: Unreliable notifications--3rd post for the same issue

Posted: Mon Dec 07, 2015 5:09 pm
by ssax
Hmm, alright, we'll need that output from the DB (when it's occurring) to see why it's not working.

Also, try running a tail on the eventman and cmdsubsys log files (when this is occurring) and force an email to be sent to see if you are seeing any errors or anything:

Code: Select all

tail -f /usr/local/nagiosxi/var/eventman.log /usr/local/nagiosxi/var/cmdsubsys.log

Re: Unreliable notifications--3rd post for the same issue

Posted: Tue Dec 08, 2015 12:05 pm
by gormank
Attached are the query outputs but the problem isn't happening right now so they may not be useful.

Re: Unreliable notifications--3rd post for the same issue

Posted: Tue Dec 08, 2015 5:05 pm
by ssax
We will need the SQL output when it's occurring to figure it out, these will help though for comparison.

Re: Unreliable notifications--3rd post for the same issue

Posted: Wed Dec 09, 2015 4:24 pm
by gormank
This is getting nuts. I have notification delays defined for pretty much everything, which is some cases don't work. I just got an alert via mail on a service that has a 15 minute delay. The alert fired a few seconds before the mail was sent.
I also have recurring downtime defined on a about 15 servers which doesn't work.

So I have 3 supposed features that don't work reliably. I wonder what doesn't work that I don't know about.

Re: Unreliable notifications--3rd post for the same issue

Posted: Wed Dec 09, 2015 4:31 pm
by tmcdonald
I think a ticket might be a better route to tackle this so we can get a system profile and eventually get a remote session scheduled. These features work for plenty of people, it might just be a matter of verifying the setup and making sure nothing else is interfering with the operation.

Please email [email protected] with a descriptive subject, and a link to this thread.

Re: Unreliable notifications--3rd post for the same issue

Posted: Fri Dec 11, 2015 11:15 am
by gormank
Sending email