SOLVED...External command for notification won't work
SOLVED...External command for notification won't work
Hello.
I have an external command script for Slack notification which had been working fine but stopped a couple months ago. I believe it stopped right after doing a Debian upgrade.
I'm also unable to execute commands from the GUI due to: "Error: Could not open command file '/usr/local/nagios/var/rw/nagios.cmd' for update!".
Steps:
1) Install Nagios Core 4.2.3 and plugins on Debian 8.8 following this guide: https://support.nagios.com/kb/article/n ... tml#Debian
2) Set up a host, a slack user, a notification command and configure them according to: http://matthewcmcmillan.blogspot.com/20 ... -with.html (except without the slack shell script naming error on that post).
3) Watched the logs and saw errors and got emails (email notification was enabled as well), but no slack messages.
4) Running the slack script manually DOES send a notification to Slack.
So, something is broken somewhere and I can't figure out what.
Let me know what info is needed, as I don't want to pollute the board with endless gobs of data from config files that may be inconsequential.
Thanks
I have an external command script for Slack notification which had been working fine but stopped a couple months ago. I believe it stopped right after doing a Debian upgrade.
I'm also unable to execute commands from the GUI due to: "Error: Could not open command file '/usr/local/nagios/var/rw/nagios.cmd' for update!".
Steps:
1) Install Nagios Core 4.2.3 and plugins on Debian 8.8 following this guide: https://support.nagios.com/kb/article/n ... tml#Debian
2) Set up a host, a slack user, a notification command and configure them according to: http://matthewcmcmillan.blogspot.com/20 ... -with.html (except without the slack shell script naming error on that post).
3) Watched the logs and saw errors and got emails (email notification was enabled as well), but no slack messages.
4) Running the slack script manually DOES send a notification to Slack.
So, something is broken somewhere and I can't figure out what.
Let me know what info is needed, as I don't want to pollute the board with endless gobs of data from config files that may be inconsequential.
Thanks
Last edited by pilotmc on Thu Aug 03, 2017 1:34 pm, edited 1 time in total.
Re: External command for notification won't work
From nagios.log:
[1497407589] Warning: Notifying contact 'nagiosadmin' of service 'Load' on host 'newlive4.playnet.com' by command '/usr/local/bin/slack_nagios.sh > /tmp/slack.log 2>&1' timed out after 0.00 seconds
[1497407589] Warning: Notifying contact 'nagiosadmin' of service 'Load' on host 'newlive4.playnet.com' by command '/usr/local/bin/slack_nagios.sh > /tmp/slack.log 2>&1' timed out after 0.00 seconds
Re: External command for notification won't work
contacts.cfg:
/usr/local/bin/slack_nagios.sh:
Code: Select all
define contact{
contact_name nagiosadmin
use generic-contact
alias Nagios Admin
service_notification_commands notify-service-by-slack
host_notification_commands notify-host-by-slack
email nagios@localhost
}
Code: Select all
#!/bin/bash
WEBHOST_NAGIOS="monitor.mydomain.com"
SLACK_CHANNEL="#alerts"
SLACK_BOTNAME="Nagios"
WEBHOOK_URL="https://hooks.slack.com/services/********/**********/********************"
#Set the message icon based on Nagios service state
if [ "$NAGIOS_SERVICESTATE" = "OK" ]
then
ICON_EMOJI=":thumbsup:"
elif [ "$NAGIOS_SERVICESTATE" = "WARNING" ]
then
ICON_EMOJI=":warning:"
elif [ "$NAGIOS_SERVICESTATE" = "CRITICAL" ]
then
ICON_EMOJI=":error:"
elif [ "$NAGIOS_SERVICESTATE" = "UNKNOWN" ]
then
ICON_EMOJI=":troll:"
else
ICON_EMOJI=":octocat:"
fi
#request for posting to a channel
curl -X POST --data "payload={\"channel\": \"${SLACK_CHANNEL}\", \"username\": \"${SLACK_BOTNAME}\", \"icon_emoji\": \":vertical_traffic_light:\", \"text\": \"${ICON_EMOJI} HOST: ${NAGIOS_HOSTNAME} SERVICE: ${NAGIOS_SERVICEDISPLAYNAME} STATE: ${NAGIOS_SERVICESTATE} MESSAGE: ${NAGIOS_SERVICEOUTPUT} <http://${WEBHOST_NAGIOS}/cgi-bin/nagios3/extinfo.cgi?host=${NAGIOS_HOSTNAME}|See Nagios>\"}" ${WEBHOOK_URL}
Re: External command for notification won't work
When a command is run by the Nagios daemon, it is run as the nagios user account and if the script's permissions are not set correctly, it may not run.
Login to the Nagios server as root, run the following commands to change the permissions of the script.
After that, test it out and see if the notification is sent.
Login to the Nagios server as root, run the following commands to change the permissions of the script.
Code: Select all
chown nagios.nagios /usr/local/bin/slack_nagios.sh
chmod a+x /usr/local/bin/slack_nagios.sh
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: External command for notification won't work
Thanks, tgriep. This was how the perms were. However, I re-issued those commands just to say I did.
Still not working:
One weird thing is that Nagios says it times out immediately, yet the slack.log has 30 seconds:
The script relies on nagios env variables being set. I wonder if they're maybe not set when the script is called.
Still not working:
Code: Select all
[1497420041] wproc: host=host4.domain.com; service=Disk Space; contact=nagiosadmin
[1497420041] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1497420041] Warning: Notifying contact 'nagiosadmin' of service 'Disk Space' on host 'host4.domain.com' by command '/usr/local/bin/slack_nagios.sh > /tmp/slack.log 2>&1' timed out after 0.00 seconds
Code: Select all
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:29 --:--:-- 0
Re: External command for notification won't work
If the Environment variables are not enabled, that is probably causing the issue.
Edit the nagios.cfg file and set the following option to 1 and restart the nagios daemon.
If that doesn't work, in the link you provided on the first post, has examples on how to setup the command in nagios to pass the macros directly without enabling the environment variables.
Edit the nagios.cfg file and set the following option to 1 and restart the nagios daemon.
Code: Select all
enable_environment_macros=1
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: External command for notification won't work
So this was enabled and server restarted. Still not getting notification to my script.
I did see something in the log I don't understand...
Aug 1 05:57:01 monitor nagios: Nagios 4.3.2 starting... (PID=3755)
Aug 1 05:57:01 monitor nagios: Local time is Tue Aug 01 05:57:01 UTC 2017
Aug 1 05:57:01 monitor nagios: LOG VERSION: 2.0
Aug 1 05:57:01 monitor nagios: qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
Aug 1 05:57:01 monitor nagios: qh: core query handler registered
Aug 1 05:57:01 monitor nagios: nerd: Channel hostchecks registered successfully
Aug 1 05:57:01 monitor nagios: nerd: Channel servicechecks registered successfully
Aug 1 05:57:01 monitor nagios: nerd: Channel opathchecks registered successfully
Aug 1 05:57:01 monitor nagios: nerd: Fully initialized and ready to rock!
Aug 1 05:57:01 monitor nagios: wproc: Successfully registered manager as @wproc with query handler
Aug 1 05:57:01 monitor nagios: wproc: Registry request: name=Core Worker 3759;pid=3759
Aug 1 05:57:01 monitor nagios: wproc: Registry request: name=Core Worker 3758;pid=3758
Aug 1 05:57:01 monitor nagios: wproc: Registry request: name=Core Worker 3757;pid=3757
Aug 1 05:57:01 monitor nagios: wproc: Registry request: name=Core Worker 3756;pid=3756
Aug 1 05:57:07 monitor nagios: Successfully launched command file worker with pid 3765
Aug 1 05:59:04 monitor nagios: SERVICE ALERT: staff;Test;CRITICAL;HARD;1;(1) < YOWZA
Aug 1 05:59:04 monitor nagios: SERVICE NOTIFICATION: slack;staff;Test;CRITICAL;notify-service-by-slack;(1) < YOWZA
Aug 1 05:59:34 monitor nagios: job 2 (pid=3820): read() returned error 11
Aug 1 05:59:34 monitor nagios: wproc: Core Worker 3756: job 2 (pid=3820) timed out. Killing it
Aug 1 05:59:34 monitor nagios: wproc: NOTIFY job 2 from worker Core Worker 3756 timed out after 30.04s
Aug 1 05:59:34 monitor nagios: wproc: host=staff; service=Test; contact=slack
Aug 1 05:59:34 monitor nagios: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Aug 1 05:59:34 monitor nagios: Warning: Notifying contact 'slack' of service 'Test' on host 'staff' by command '/usr/local/bin/slack_nagios.sh "staff" "Test" "CRITICAL" "(1) YOWZA" "PROBLEM"' timed out after 0.00 seconds
This is for a test monitor that checks a logfile for the word YOWZA.
I can see that Nagios gets the alert. It emails it successfully to one of my contact's email addresses.
But, then when it tries to use my notification script, the worker times out with what appears to be a read error, but there is no indication of what was trying to be read.
Any ideas?
I did see something in the log I don't understand...
Aug 1 05:57:01 monitor nagios: Nagios 4.3.2 starting... (PID=3755)
Aug 1 05:57:01 monitor nagios: Local time is Tue Aug 01 05:57:01 UTC 2017
Aug 1 05:57:01 monitor nagios: LOG VERSION: 2.0
Aug 1 05:57:01 monitor nagios: qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
Aug 1 05:57:01 monitor nagios: qh: core query handler registered
Aug 1 05:57:01 monitor nagios: nerd: Channel hostchecks registered successfully
Aug 1 05:57:01 monitor nagios: nerd: Channel servicechecks registered successfully
Aug 1 05:57:01 monitor nagios: nerd: Channel opathchecks registered successfully
Aug 1 05:57:01 monitor nagios: nerd: Fully initialized and ready to rock!
Aug 1 05:57:01 monitor nagios: wproc: Successfully registered manager as @wproc with query handler
Aug 1 05:57:01 monitor nagios: wproc: Registry request: name=Core Worker 3759;pid=3759
Aug 1 05:57:01 monitor nagios: wproc: Registry request: name=Core Worker 3758;pid=3758
Aug 1 05:57:01 monitor nagios: wproc: Registry request: name=Core Worker 3757;pid=3757
Aug 1 05:57:01 monitor nagios: wproc: Registry request: name=Core Worker 3756;pid=3756
Aug 1 05:57:07 monitor nagios: Successfully launched command file worker with pid 3765
Aug 1 05:59:04 monitor nagios: SERVICE ALERT: staff;Test;CRITICAL;HARD;1;(1) < YOWZA
Aug 1 05:59:04 monitor nagios: SERVICE NOTIFICATION: slack;staff;Test;CRITICAL;notify-service-by-slack;(1) < YOWZA
Aug 1 05:59:34 monitor nagios: job 2 (pid=3820): read() returned error 11
Aug 1 05:59:34 monitor nagios: wproc: Core Worker 3756: job 2 (pid=3820) timed out. Killing it
Aug 1 05:59:34 monitor nagios: wproc: NOTIFY job 2 from worker Core Worker 3756 timed out after 30.04s
Aug 1 05:59:34 monitor nagios: wproc: host=staff; service=Test; contact=slack
Aug 1 05:59:34 monitor nagios: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Aug 1 05:59:34 monitor nagios: Warning: Notifying contact 'slack' of service 'Test' on host 'staff' by command '/usr/local/bin/slack_nagios.sh "staff" "Test" "CRITICAL" "(1) YOWZA" "PROBLEM"' timed out after 0.00 seconds
This is for a test monitor that checks a logfile for the word YOWZA.
I can see that Nagios gets the alert. It emails it successfully to one of my contact's email addresses.
But, then when it tries to use my notification script, the worker times out with what appears to be a read error, but there is no indication of what was trying to be read.
Any ideas?
Re: External command for notification won't work
The
https://github.com/NagiosEnterprises/na ... issues/172
The real error is
You may want to echo the command in the script to see if the variables get passed to the script.
messages sounds like it is a logging bug."Aug 1 05:59:34 monitor nagios: job 2 (pid=3820): read() returned error 11"
https://github.com/NagiosEnterprises/na ... issues/172
The real error is
and it looks like Nagios tried to run the command but didn't work for some reason.wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62
You may want to echo the command in the script to see if the variables get passed to the script.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: External command for notification won't work
Thanks, tgriep.
Yes, I've tested the command from the command line and it works all the time. It's a curl call to Slack's API in order to send alerts to a specific channel.
I'm going to replace the script with a very simple one that just logs what's being passed in (again) to see if Nagios actually ever sends the command through nagios.cmd
Yes, I've tested the command from the command line and it works all the time. It's a curl call to Slack's API in order to send alerts to a specific channel.
I'm going to replace the script with a very simple one that just logs what's being passed in (again) to see if Nagios actually ever sends the command through nagios.cmd
Re: External command for notification won't work
Just as I thought... Nagios never even calls the external command on the notification... it just says "timed out after 0.00 seconds".
So, Nagios can't or won't send the notification
Any other ideas?
So, Nagios can't or won't send the notification
Any other ideas?