Need ability to reply-all to an alert

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
zoodle
Posts: 7
Joined: Mon Feb 24, 2014 12:56 pm

Need ability to reply-all to an alert

Post by zoodle »

Hi,
I've inherited a mid-sized Nagios implementation and am having some troubles with email notifications. Our on-call tech must email all recipients of the notification before working on the issue. Because Nagios sends separate notifications to each contact, the tech cannot simply reply-all to the email. Further, the tech cannot determine from the email who has been notified.

I have found others with this same issue and I see some people have a workaround where they add a contact to each contactgroup where the email address is something like [email protected],[email protected],[email protected]. This gets passed along to the MTA where it is expanded. This won't work for us because we have many services and contactgroups.

Is there a solution that adds a reply-to header in the email notifications that would expand the email addresses in the contactgroups assigned to a service?

Thank you,
-Terry
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Need ability to reply-all to an alert

Post by abrist »

Most people control these types of groups from their exchange/mail relay server. You can send to multiple recipients with sendmail:
http://stackoverflow.com/questions/1339 ... recipients
That way the recipient will all receive email from a "reply all".
Or you can set a reply-to env var with mail:
http://stackoverflow.com/questions/5472 ... -unix-mail
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
zoodle
Posts: 7
Joined: Mon Feb 24, 2014 12:56 pm

Re: Need ability to reply-all to an alert

Post by zoodle »

I have a feeling I'm misusing Nagios in some way but I can't figure out what I'm doing wrong.

We use Nagios to email out to multiple teams for each alert. A disk alert will go to our server operations people, the manager of the application running on that server and anyone else who has an interest. If the disk were on a database, it will also go to our DBA team. The server ops people usually resolve the issue quickly. When they do, they need a way to send a reply to anyone who got the notification.

In the older Nagios infrastructure I inherited from a former colleague he's "solved" this by creating a separate contact for every alert where the email address is a comma-separated list of all addresses. Sendmail expands the address filed and the one Nagios notification is distributed to multiple recipients. Anyone can reply-all to the email. This isn't manageable because it doesn't allow for alerts that go to multiple contactgroups or allow a contact to disable notifications, etc.

This must be a common scenario for users of Nagios but I don't see how others have solved it without resorting to doing what my colleague did. If someone has some insight about how to deal with this I'd very much appreciate it.

Thank you,
-Terry
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Need ability to reply-all to an alert

Post by abrist »

I don't think you are misusing Nagios, you just may be expecting too much from sendmail/mailx/mail. Emails are sent out directly to addresses on contacts. In order to get mail group functionality, most people implement it on their mail/relay/exchange server. To do what you want without implementing it on a mail server level, you would need to do what was already done, or create a custom notification for each different notification pool (this is even more cumbersome).

I would like to point out that all the contacts should receive an email when the problem has been resolved and the checked object has recovered to a OK state. Additionally, once a responder acknowledges the issue, alerts will cease until it recovers, which is usually enough for the other contacts to assume that it is being handled. Nagios was not envisioned as a ticketing system, but was designed to integrate into other systems.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
User avatar
millisa
Posts: 69
Joined: Thu Jan 16, 2014 11:13 pm
Location: Austin, TX
Contact:

Re: Need ability to reply-all to an alert

Post by millisa »

Isn't this what the 'Send custom service notification' and 'Send custom host notification' are for?

I may be missing something, but it sounds like the mail server is being used outside of nagios when everything that's needed is inside.

As Andy said, nagios should be sending out an alert to all the valid people for a service/host when that check goes critical; it also sends out a notification when a host/service is acknowledged in nagios. When the service or host recovers, there is a third notification that goes out saying all is better.
If you aren't getting those 3 notifications, then it sounds like the contact or the service/host definition is setup to not send them?
Check the notification_options for the services and hosts - your hosts generally should at least a d,u,r (down, unreachable, recovery) without an 'n' option (that'd be none); services should have at least c,u,r (critical, unknown, recovery) or w,c,u,r (notify for warnings too) - again, no 'n'.

You can determine who has been notified from within nagios - its on the lower left part of the nav bar in the Reports section, titled 'Notifications'. But, you shouldn't really need to look there, the critical/acknowledged/recovery notices should already be going to who they need to go to if you set the service checks right.

If you need to send an additional bit of info about a specific host or service, you can click on a host or a service and over on the right side in the Host Commands or Service Commands box (you know, the place where you acknowledge the alerts) there's a link for 'Send custom host/service notification'. That should send out a notice to the valid recipients (pay attention to the command description - you can use the Forced box to send out regardless of time restrictions, the broadcast box allows sending to include escalated contacts)
This command is used to send a custom notification about the specified host. Useful in emergencies when you need to notify admins of an issue regarding a monitored system or service. Custom notifications normally follow the regular notification logic in Nagios. Selecting the Forced option will force the notification to be sent out, regardless of the time restrictions, whether or not notifications are enabled, etc. Selecting the Broadcast option causes the notification to be sent out to all normal (non-escalated) and escalated contacts. These options allow you to override the normal notification logic if you need to get an important message out.
I can't find the specific reference, but I'm pretty sure that feature showed up around nagios3.0

Generally, my alerts follow this flow:

Nagios send critical alert to all contacts. (nagios mail #1)
I (or whoever gets paged first and is on call) logs in to the nagios box, picks out the host or service and acknowledged it "Ack -A Looks like the gibson is down" which gets mailed out automatically to everyone who got the critical alerts. (nagios mail #2)
I open a ticket in my ticketing system to take notes (and track sweet billable time) and jot down the ticket number. (generates some of its own mail)
The gibson is fixed, nagios sees it and recovers, sending out a new notification to everyone who saw that I'd acknowledged it. (nagios mail #3)
Since this was the gibson that went down, I get back on nagios and send out a custom notification that just says "Gibson looks fixed, details in ticket#12345" and everyone who cares about it can hunt it down later. (nagios mail #4 - when needed)

It lets the monitoring system be a monitoring and alert system and lets your ticket system be your ticket system...
Give a a quick read of how nagios notification work - the who gets notified section matters. The fact that you are sending out what sounds effectively like an 'acknowledgment' manually makes me suspect that a key feature of nagios may have been missed somewhere in your organizations history? Either that or I'm not fully comprehending the problem...
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Need ability to reply-all to an alert

Post by abrist »

Brilliant run down millisa.
@Zoodle, let us know if you have more questions.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
zoodle
Posts: 7
Joined: Mon Feb 24, 2014 12:56 pm

Re: Need ability to reply-all to an alert

Post by zoodle »

Thanks! I really appreciate the replies. I may have more questions, but first I need to re-read the last few replies and understand them better.

Thanks,
-Terry
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Need ability to reply-all to an alert

Post by sreinhardt »

Sounds good, let us know if there is anything else!
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
zoodle
Posts: 7
Joined: Mon Feb 24, 2014 12:56 pm

Re: Need ability to reply-all to an alert

Post by zoodle »

This thread has helped me understand Nagios much better! I haven't yet achieved what I want, but I'm much closer.

Summary: I wanted a way to reply-all to the email from Nagios and reach all the recipients. Nagios spawns a separate processes for each contact in the service contactgroup(s) so it's not apparent by looking at the notification to see who also received it. Millisa suggested making use of the Nagios acknowledge or "send custom notification" features rather than replying via email because that was the intention of these features.

These are great suggestions and I'll put them in place right away. I was still hoping I could make it more apparent in the notification email who else was notified. I've added the $NOTIFICATIONRECIPIENTS$ macro to the body of the email in the notify-host-by-email and notify-service-by-email commands. This gives me the contact names but I have to look them up in Nagios if I want to find the email addresses for these contacts. Now, my host notifications show me who else is getting the notification.


Notification Type: PROBLEM

Host: web04e04-1
State: DOWN
Address: web04e04-1
Info: CRITICAL - Host Unreachable (web04e04-1)

Date/Time: Thu Feb 27 04:54:45 GMT 2014
Recipients: webteam,webmgr,e04team,operations
Monitoring Host: nagios-22
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Need ability to reply-all to an alert

Post by slansing »

Excellent, in my time here I've actually never used that macro, how handy!
Locked