Page 1 of 2

Contacts and Alerts

Posted: Mon Dec 11, 2017 4:08 pm
by jandrews-flhi
Hello,

I just received a report that multiple people are getting Nagios alerts they're not supposed to be receiving. The specific systems that these alerts are for should only be getting sent to two specific individuals and after reviewing the settings everything appears to be properly configured, yet alerts are being distributed to everyone.

The following is an example of the format used in the API calls that are made when systems are spun up on AWS. As you can see the contact group is properly set. Yet these systems and all others even deployed with the wizards are deploying notifications to every contact in the system if an alert is triggered.

Code: Select all

# SERVICE CPU STATS
curl -XPOST "http://nagios.flhi.com/nagiosxi/api/v1/config/service?apikey=${API}&pretty=1" -d "host_name=${ID}&\
service_description=CPU Stats&\
check_command=check_nrpe\!check_cpu_stats\!-a '-w 85 -c 95'&\
check_interval=5&\
retry_interval=1&\
max_check_attempts=5&\
check_period=24x7&\
contact_groups=Linux&\
notification_interval=5&\
notification_period=24x7&\
applyconfig=1"
  • 1.Linux Distribution and version? CentOS Linux release 7.4.1708
    2. 32 or 64bit? x64
    3. VMware Image or Manual Install of XI? Manual
    4. Are there special configurations on your system? No.
The following is an example of the nagios.log located in /usr/local/nagios/libexec/var/nagios.log... This is showing the alerts generated by an EC2 Instance being destroyed.. This uses the initial API calls that I provided above so the only ones who should be getting these alerts are the Linux contact group. Instead as you can see below, its distributing these alerts to everyone.

Code: Select all

[1513023896] SERVICE ALERT: i-044826cb9a126f379;Winbind Daemon;CRITICAL;SOFT;4;(Return code of 255 is out of bounds)
[1513023961] SERVICE ALERT: i-044826cb9a126f379;System Logging Daemon;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30 seconds.
[1513023961] SERVICE NOTIFICATION: fjubas;i-044826cb9a126f379;System Logging Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout
after 30 seconds.
[1513023961] SERVICE NOTIFICATION: fsuarez;i-044826cb9a126f379;System Logging Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout
 after 30 seconds.
[1513023961] SERVICE NOTIFICATION: jfisher;i-044826cb9a126f379;System Logging Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout
 after 30 seconds.
[1513023961] SERVICE NOTIFICATION: jvelez;i-044826cb9a126f379;System Logging Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout
after 30 seconds.
[1513023961] SERVICE NOTIFICATION: mmoscater;i-044826cb9a126f379;System Logging Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeo
ut after 30 seconds.
[1513023961] SERVICE NOTIFICATION: nagiosadmin;i-044826cb9a126f379;System Logging Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket tim
eout after 30 seconds.
[1513023961] SERVICE NOTIFICATION: rweatherbee;i-044826cb9a126f379;System Logging Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket tim
eout after 30 seconds.
[1513023961] SERVICE NOTIFICATION: jandrews;i-044826cb9a126f379;System Logging Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeou
t after 30 seconds.
[1513023961] SERVICE NOTIFICATION: mcole;i-044826cb9a126f379;System Logging Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout a
fter 30 seconds.
[1513023983] SERVICE ALERT: i-044826cb9a126f379;Winbind Daemon;CRITICAL;HARD;5;CHECK_NRPE: Socket timeout after 30 seconds.
[1513023983] SERVICE NOTIFICATION: fjubas;i-044826cb9a126f379;Winbind Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 3
0 seconds.
[1513023983] SERVICE NOTIFICATION: fsuarez;i-044826cb9a126f379;Winbind Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after
30 seconds.
[1513023983] SERVICE NOTIFICATION: jfisher;i-044826cb9a126f379;Winbind Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after
30 seconds.
[1513023983] SERVICE NOTIFICATION: jvelez;i-044826cb9a126f379;Winbind Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 30 seconds.
[1513023983] SERVICE NOTIFICATION: mmoscater;i-044826cb9a126f379;Winbind Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 30 seconds.
[1513023983] SERVICE NOTIFICATION: nagiosadmin;i-044826cb9a126f379;Winbind Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 30 seconds.
[1513023983] SERVICE NOTIFICATION: rweatherbee;i-044826cb9a126f379;Winbind Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 30 seconds.
[1513023983] SERVICE NOTIFICATION: jandrews;i-044826cb9a126f379;Winbind Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 30 seconds.
[1513023983] SERVICE NOTIFICATION: mcole;i-044826cb9a126f379;Winbind Daemon;CRITICAL;xi_service_notification_handler;CHECK_NRPE: Socket timeout after 30 seconds.

Re: Contacts and Alerts

Posted: Mon Dec 11, 2017 5:17 pm
by dwhitfield
Please attach or PM your objects.cache and your contactsgroups.cfg.

Of less importance, but might be useful, would be your profile. You can download it by going to Admin > System Config > System Profile and click the ***Download Profile*** button towards the top. If for whatever reason you *cannot* download the profile, please put the output of View System Info (5.3.4+, Show Profile if older) in the thread (that will at least get us some info). This will give us access to many of the logs we would otherwise ask for individually. If security is a concern, you can unzip the profile take out what you like, and then zip it up again. We may end up needing something you remove, but we can ask for that specifically.

You can also generate a profile manually using the script at /usr/local/nagiosxi/html/includes/components/profile/getprofile.sh

That should generate a profile in /usr/local/nagiosxi/var/components/ which you can get off the server with an application such as FileZilla.

After you PM the profile, please update this thread. Updating this thread is the only way for it to show back up on our dashboard.

If you get an error that PROFILE BUILD FAILED, please see https://support.nagios.com/kb/article.p ... ategory=44

UPDATE: request info shared with techs

Re: Contacts and Alerts

Posted: Mon Dec 11, 2017 5:49 pm
by jandrews-flhi
Hello,

PM has been sent with the requested items.

Thank you, Doug.

Re: Contacts and Alerts

Posted: Mon Dec 11, 2017 5:56 pm
by dwhitfield
I found your problem (at least at a high level). In your objects.cache, you have this:

Code: Select all

define contactgroup {
	contactgroup_name	Linux
	alias	Linux Administrators
	members	mcole,jandrews,rweatherbee,nagiosadmin,mmoscater,jvelez,jfisher,fsuarez,fjubas
	}
In the CCM, if you go to the Linux contact group does it have another contactgroup listed? (second button under "Assign Memberships")

Re: Contacts and Alerts

Posted: Tue Dec 12, 2017 8:38 am
by jandrews-flhi
Doug,

This was literally the first place I checked. Via CCM it only has the two, myself and mcole.
Screen Shot 2017-12-12 at 8.31.54 AM.png
Screen Shot 2017-12-12 at 8.36.00 AM.png

Re: Contacts and Alerts

Posted: Tue Dec 12, 2017 12:09 pm
by kyang
Let's go to your CCM --> under Tools click Config File Management --> click write --> delete --> write --> verify.

Then let's see if that removes it from your objects.cache.

You could resend us a new profile or just the objects.cache file.

Thanks!


Update: Profile received!

Re: Contacts and Alerts

Posted: Wed Dec 13, 2017 9:06 am
by jandrews-flhi
I've sent you the profile.

Re: Contacts and Alerts

Posted: Wed Dec 13, 2017 1:48 pm
by kyang
Can you show us the screenshot of this icon in the contact group?
Capture.PNG
Click the Blue i and show us the relationships of contact group.

I am specifically looking to see if you have a contact template assigned to the Linux contact group.

If it is connected to a contact template, please show us which users are in that contact template.
Capture.PNG
Like this, because even though I removed some of my contacts in a contactgroup, the objects.cache file still shows all contacts. Which it should not.

I saw my contactgroup was assigned to a template, which was for xi_contactgroup_all. (Once I removed this, then my objects.cache file showed the correct users)

Hopefully that is the case for you. Let us know!

Re: Contacts and Alerts

Posted: Thu Dec 14, 2017 9:17 am
by jandrews-flhi
Per your request.
Screen Shot 2017-12-14 at 9.13.56 AM.png
Screen Shot 2017-12-14 at 9.13.38 AM.png

Re: Contacts and Alerts

Posted: Thu Dec 14, 2017 11:51 am
by kyang
Yes, that one. In that same contact-template take out that contact group Linux.

Apply configuration, and then run this command and see if it outputs correctly for that contact group. Because your objects.cache file is still showing all contacts in that contact_group, which it shouldn't.

Hopefully, this resolves it.

Code: Select all

grep -A 3 Linux /usr/local/nagios/var/objects.cache
Please post the output, thanks!