Page 1 of 2

Use Amazon SNS for alerts?

Posted: Tue Jun 18, 2019 9:04 am
by tanod
Hello,

I would like to send SMS alerts via Amazon SNS, for price reasons (It's free below 100 SMS per month, which will be enough for us).

It seems to be different from the classic way of entering phone numbers into XI, whereas with SNS, XI should send a notification to SNS which in turn sends the SMS, the numbers are then on SNS.
But I don't know how to proceed.

Does anyone use this service? Can you help me?

Thank you!

Re: Use Amazon SNS for alerts?

Posted: Tue Jun 18, 2019 4:36 pm
by cdienger

Re: Use Amazon SNS for alerts?

Posted: Fri Jun 21, 2019 5:20 am
by tanod
These documents are unfortunately not enough.

I tried to write my own script (I'm not very comfortable with Python) :

Code: Select all

#!/usr/bin/env python
import boto3
import sys
if len(sys.argv) < 3:
        print "USAGE : " + sys.argv[0] + " sns_topic_arn message"
        exit()

arn = sys.argv[1]
msg = sys.argv[2]
sns = boto3.client('sns')
response = sns.publish(TopicArn = arn, Message = msg,)
print response
file = open("snslog", "a")
sys.stdout = file
print response
print "\n\n"
file.close()
And I created some needed files with nagios user, according to this.

I execute it with the following command :

Code: Select all

/usr/local/nagios/libexec/send_sns3.py "my_AWS_ARN" "My_Message"
When I execute it manually from command line, it works : I receive my SMS and a line is added in my log (snslog).

To use it with Nagios, I followed this doc.

The command I added on the CCM is:

Code: Select all

$USER1$/send_sns3.py "my_AWS_ARN" "$NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$"
BUT, in this case, it does not works! I don't receive anything. And no log is written in my file.

Permissions looks correct:
ll.png
(files with root user are not used)


In /var/log/messages here is what I see, about 30 seconds later:

Code: Select all

Jun 21 12:16:46 localhost nagios: Warning: Host event handler command '/usr/local/nagios/libexec/send_sns3.py "my_AWS_ARN" " Host Alert: VM Prometheus is DOWN"' timed out after 0.00 seconds
I don't know what to do.

Can you help me?

Thank you!

Re: Use Amazon SNS for alerts?

Posted: Fri Jun 21, 2019 11:53 am
by lmiltchev
When I execute it manually from command line, it works : I receive my SMS and a line is added in my log (snslog).
Are you running the command as root or as nagios user? Can you try running it as nagios?

Code: Select all

su - nagios
<your command>

Re: Use Amazon SNS for alerts?

Posted: Mon Jun 24, 2019 2:18 am
by tanod
Yes it works as nagios user.

Re: Use Amazon SNS for alerts?

Posted: Mon Jun 24, 2019 3:31 pm
by lmiltchev
It's strange that your script works from the command line (even when run as nagios user), but when the event handler kicks in, it fails right away (times out instantaneously).
Jun 21 12:16:46 localhost nagios: Warning: Host event handler command '/usr/local/nagios/libexec/send_sns3.py "my_AWS_ARN" " Host Alert: VM Prometheus is DOWN"' timed out after 0.00 seconds
The message above doesn't give us much info, so try TEMPORARILY enabling debugging but setting the following in the nagios.cfg file:

Code: Select all

debug_file=/usr/local/nagios/var/nagios.debug
debug_level=-1
debug_verbosity=2
then restart nagios:

Code: Select all

service nagios restart
Test your script again, then post the nagios.debug on the forum.

Re: Use Amazon SNS for alerts?

Posted: Tue Jun 25, 2019 6:44 am
by tanod
The time out is not really instanteneously.
When I submit passive check result, I see a line on /var/log/messages, and this time out comes about 30 seconds later.

There is too many logs to understand something... 4900 lines on about 1 minute.

Re: Use Amazon SNS for alerts?

Posted: Tue Jun 25, 2019 11:41 am
by swolf
Instead of printing to debug, you should try using exit codes. Most likely, nagios won't print any of your debugging output anywhere you can see it (it may show up in nagios.log, but that's not helpful in this case).

I'm not familar with the boto3 library, but most likely you have an error or exception coming from

Code: Select all

sns = boto3.client('sns')
response = sns.publish(TopicArn = arn, Message = msg,)
you can see the exact line by looking in the nagios.debug file for lines like

Code: Select all

Processed host event handler command line:
You should copy those exact commands into the nagios user's terminal and see if anything interesting happens.

Re: Use Amazon SNS for alerts?

Posted: Wed Jun 26, 2019 5:03 am
by tanod
I'm not either familar with the boto3 library, but I can't see how an error could happen in these two lines, first, this is defined as it in the aws documentation, second, it works manually with any user.

I think I've done something wrong in Nagios.

I can give more interesting logs (coming from /var/log/messages), I don't remember if it was not printed before or if I did not shared it :

Code: Select all

Jun 26 11:33:07 localhost nagios: HOST ALERT: DebugSNS;DOWN;HARD;1;azerty
Jun 26 11:33:14 localhost systemd: Started Session c36 of user root.
Jun 26 11:33:37 localhost nagios: job 166 (pid=23222): read() returned error 11
Jun 26 11:33:37 localhost nagios: wproc: Core Worker 6616: job 166 (pid=23222) timed out. Killing it
Jun 26 11:33:37 localhost nagios: wproc: HOST EVENTHANDLER job 166 from worker Core Worker 6616 timed out after 30.02s
Jun 26 11:33:37 localhost nagios: wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Jun 26 11:33:37 localhost nagios: Warning: Host event handler command '/usr/local/nagios/libexec/send_sns3.py "myARN" " Host Alert: DebugSNS is DOWN"' timed out after 0.00 seconds
Jun 26 11:33:38 localhost nagios: wproc: Core Worker 6616: job 166 (pid=23222): Dormant child reaped
I've changed a bit my script :

Code: Select all

#!/usr/bin/env python
import boto3
import sys
file = open("snslog", "a")
sys.stdout = file
print "script started"
if len(sys.argv) < 3:
	print "USAGE : " + sys.argv[0] + " sns_topic_arn message"
	exit()
arn = sys.argv[1]
msg = sys.argv[2]
sns = boto3.client('sns')
response = sns.publish(TopicArn = arn, Message = msg,)
print response
print "\n\n"
file.close()
As you can see, the first thing I do is to print something in my logfile. It works manually (any user), it doesn't from Nagios XI.

Here are my Nagios configs:

I uploaded the plugin in Admin -> Manage plugin
plugin.png
I defined the command:
command.png
I configured my host:
hostcheck.png
(see next post, I can't upload more than 3 attachments)

Re: Use Amazon SNS for alerts?

Posted: Wed Jun 26, 2019 5:06 am
by tanod
Alert settings:
hostalert.png
And here is how I test:
test.png
Maybe I forgot something? Or that I did something wrong?

This is Nagios XI 5.6.3

Thank you for your replies