Notifications going to an *unintended* contact?

HappMacDonald · Post by **HappMacDonald** » Tue Apr 30, 2019 4:24 pm

Hello, I have a nagios core installation that appears to try to send notifications for certain objects to a certain contact that I do *not* want it to.

I cannot find any place in the configs that relates that contact to that object.

I know it's that contact and not just the email somehow assigned to a different contact or forwarded because the var/nagios.log says so via line "SERVICE NOTIFICATION". That also gets the contents of the notification commands off of the hook.

So is there any tool that lays bare what logic nagios uses to determine which contacts should be used for any specific notification event?

Kind of like devtools in all modern web browsers lays bare the logic behind why a certain HTML element got the CSS rules that it did?

Also I appreciate that I can post configs and folk could probably help me find what's wrong in this specific circumstance, but I'm hoping for more of a general kind of tool or strategy to troubleshoot this sort of thing, since tomorrow it could happen with a different notification, and the configs are kind of a complicated mess full of sensitive stuff I'd have to pick through and redact first. :/

So any general strategies would be apprecited, thank you folks.

ssax · Post by **ssax** » Wed May 01, 2019 11:17 am

Generally I do this:

First I check for duplicate processes (this should never show more than 2):

Code: Select all

ps aux | grep nagios.cfg

This can cause strange things. I also check the kernel message queue if they are running NDO2DB (it should only list one nagios one):

Code: Select all

ipcs -q | grep nagios

Then I search the configs to see what they are attached to:

Code: Select all

grep -i -R 'contact@email.com' /usr/local/nagios/etc/*
grep -i -R 'contactname' /usr/local/nagios/etc/*

More than likely though they are part of a contact group (or a sub contact group) that is assigned to a service so I just look at the host/service they received the notification for and walk all the contacts/contact groups/etc from the host/service AND all of their host/service templates that are attached to find where it's being set. It's a manual process.

I usually open up the /usr/local/nagios/var/objects.cache file and see what it's final calculated values are to see if it's coming from the configs OR if it's coming from somewhere else (bug in system, duplicate process, some other issue).

If you need help, zip up your entire nagios/etc directory and PM us the files, the contact name, and the contacts email address that they are receiving it at.

Code: Select all

zip -r /tmp/NAGIOSFILES.zip /usr/local/nagios/etc

HappMacDonald · Post by **HappMacDonald** » Fri May 03, 2019 9:02 pm

If you need help, zip up your entire nagios/etc directory and PM us the files, the contact name, and the contacts email address that they are receiving it at.

Well, we have a number of obstacles on this route.

1: Forum says that my user account is too new to PM anything to anyone.

2: I did mention i my first question:

Also I appreciate that I can post configs and folk could probably help me find what's wrong in this specific circumstance, but I'm hoping for more of a general kind of tool or strategy to troubleshoot this sort of thing, since tomorrow it could happen with a different notification, and the configs are kind of a complicated mess full of sensitive stuff I'd have to pick through and redact first. :/

Now your mentioning of objects.cache does help streamline things a bit, since that's all relevant configs boiled down into a single file.

I took the liberty of taking a copy of that and redacting over SEVEN HUNDRED sensitive data points, all very carefully so that every token continues to uniquely match every other occurrence in the file allowing one to walk the tree from one confusing place to another as if nothing was missing.

I'd be willing to share that with you, along with all troubleshooting context I have including which test was triggered, which email address AND contact were invoked out of place, and the nagios log lines that correspond to the attempt, if you could just let me know where to send them.

Thank you for your consideration,

- - Jesse Thompson
Webformix, Bend OR

Post by **tgriep** » Mon May 06, 2019 2:47 pm

I just changed your Account status so you should be able to PM either ssax or myself the data.
Try it out and let us know when the data is sent.

HappMacDonald · Post by **HappMacDonald** » Mon May 13, 2019 2:55 pm

Thank you tgriep, I have PM'ed ssax with the requested config data.

Let me know if it would help to send a copy of the configs to you or anywhere else.

Personally I still vote that there should simply be a better tool available to analyze or log Nagios' logic of how it interprets the configs when making any given decision.

Because hosts have services, both hosts and services pull from potentially a chain of templates, any of those hosts, services, or templates can specify either contacts or contact groups (among other complications like escalation, time of day filtering, alternates, etc) which in turn can each invoke a chain of contact templates, group members, and group member contact templates.

So when you follow this spiderweb of associations by hand and still find zero link it would be nice to follow a log of Nagios tracing the same spiderweb to see if (and if so where) it may have made some kind of error in logic. :S

Post by **tgriep** » Mon May 13, 2019 4:50 pm

ssax did not receive the PM so can you send it to me instead?

The first thing I do is to look at the objects.cache file to check the settings.
Next, I just recursive grep the configs to find out where the offending object is defined.

HappMacDonald · Post by **HappMacDonald** » Mon May 13, 2019 7:09 pm

OK, message re-drafted and sent to both of you.

It did not show in my outbox, so I'll wager that I hit "preview" and then got sufficiently distracted as to think the message got sent. :/

Post by **tgriep** » Tue May 14, 2019 9:06 am

The link to the contact is coming through the ContactGroup called admin.
In the service, you have added the Contactgroup to it.
In that Contact group, it has the nagiosadmin contact assigned to it.
So, either remove the Contactgroup from the service or remove the nagiosadmin account from the admin Contactgroup and that should stop the emails.

HappMacDonald · Post by **HappMacDonald** » Tue May 14, 2019 2:55 pm

Hmmm, so it is. Thank you for finding that in the object.cache.

But that moves the confusion to how that contact group gets into the service object. (forum breaks when I try to include a unicode thinking emoji)

ospf service object "use"'s xinetd-check-template.
xinetd-check-template object uses nothing.

Neither object has any contact groups at all. Here is the pre-compiled config for both of those:

define service {
service_description ospf
host_name syslog01[redacted-host]
use xinetd-check-template
action_url null
check_interval 2;
normal_check_interval 2;
max_check_attempts 1 ; Try every couple of minutes, report immediately on a "failure" as it is unlikely to repeat.

notification_interval 60 ; Re-notify about service problems every hour
notification_options w,u,c ; Do not alert on recovery, as for "event" checks like this, recovery is redundant.
process_perf_data 0 ; Process performance data
}

define service {
name xinetd-check-template
register 0; # just a template service
check_command check_wfx_remote_xinetd!-P [redacted-port]
action_url [redacted-uri]
active_checks_enabled 1 ; Active service checks are enabled
check_freshness 0 ; Default is to NOT check service 'freshness'
check_period 24x7 ; The service can be checked at any time of the day
contacts serveralert
event_handler_enabled 1 ; Service event handler is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
flap_detection_enabled 0 ; Flap detection is enabled
is_volatile 0 ; The service is not volatile
max_check_attempts 10;
normal_check_interval 5;
notification_interval 60 ; Re-notify about service problems every hour
notification_options w,c,r ; Send notifications about warning, critical, and recovery events
notification_period 24x7 ; Notifications can be sent out at any time
notifications_enabled 1 ; Service notifications are enabled
obsess_over_service 0 ; We should obsess over this service (if necessary)
parallelize_check 1 ; Active checks should be parallelized (disabling can lead to performance problems)
passive_checks_enabled 0 ; Passive service checks are enabled/accepted
process_perf_data 1 ; Process performance data
retain_nonstatus_information 1 ; Retain non-status information across program restarts
retain_status_information 1 ; Retain status information across program restarts
retry_check_interval 5 ; In minutes until a final hard state is determined
}

Now the syslog01 *host* has contactgroups admin, but that's because we want them to be notified if the entire host goes down. is ospf service object inheriting the host's contacts too?

If so, then how can we have a service not always alarm the contacts who need to know when the whole host is down?

And if not then how else could it be getting in?

Post by **tgriep** » Tue May 14, 2019 3:06 pm

One of the time saving features in Nagios is called Object Inheritance.
One of the features of the Inheritance is if you do not have a contact or contactgroup assigned to a service, it will inherit the settings from the host object.
Check out the "Implied Inheritance" section of this link.
https://assets.nagios.com/downloads/nag ... tance.html

Create a Contactgroup without any members and add that to the template the service is using, and that will remove the inherited group from the host.

Nagios Support Forum

Notifications going to an unintended contact?

Notifications going to an unintended contact?

Re: Notifications going to an unintended contact?

Re: Notifications going to an unintended contact?

Re: Notifications going to an unintended contact?

Re: Notifications going to an unintended contact?

Re: Notifications going to an unintended contact?

Re: Notifications going to an unintended contact?

Re: Notifications going to an unintended contact?

Re: Notifications going to an unintended contact?

Re: Notifications going to an unintended contact?