SOLVED...External command for notification won't work

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
pilotmc
Posts: 21
Joined: Tue May 23, 2017 3:33 pm

SOLVED...External command for notification won't work

Post by pilotmc »

Hello.

I have an external command script for Slack notification which had been working fine but stopped a couple months ago. I believe it stopped right after doing a Debian upgrade.
I'm also unable to execute commands from the GUI due to: "Error: Could not open command file '/usr/local/nagios/var/rw/nagios.cmd' for update!".

Steps:

1) Install Nagios Core 4.2.3 and plugins on Debian 8.8 following this guide: https://support.nagios.com/kb/article/n ... tml#Debian
2) Set up a host, a slack user, a notification command and configure them according to: http://matthewcmcmillan.blogspot.com/20 ... -with.html (except without the slack shell script naming error on that post).
3) Watched the logs and saw errors and got emails (email notification was enabled as well), but no slack messages.
4) Running the slack script manually DOES send a notification to Slack.

So, something is broken somewhere and I can't figure out what.

Let me know what info is needed, as I don't want to pollute the board with endless gobs of data from config files that may be inconsequential.
Thanks
Last edited by pilotmc on Thu Aug 03, 2017 1:34 pm, edited 1 time in total.
pilotmc
Posts: 21
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Post by pilotmc »

From nagios.log:

[1497407589] Warning: Notifying contact 'nagiosadmin' of service 'Load' on host 'newlive4.playnet.com' by command '/usr/local/bin/slack_nagios.sh > /tmp/slack.log 2>&1' timed out after 0.00 seconds
pilotmc
Posts: 21
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Post by pilotmc »

contacts.cfg:

Code: Select all

define contact{
    contact_name                nagiosadmin
    use                         generic-contact
    alias                       Nagios Admin
    service_notification_commands   notify-service-by-slack
    host_notification_commands     notify-host-by-slack
    email                       nagios@localhost
        }
/usr/local/bin/slack_nagios.sh:

Code: Select all

#!/bin/bash

WEBHOST_NAGIOS="monitor.mydomain.com"
SLACK_CHANNEL="#alerts"
SLACK_BOTNAME="Nagios"
WEBHOOK_URL="https://hooks.slack.com/services/********/**********/********************"

#Set the message icon based on Nagios service state
if [ "$NAGIOS_SERVICESTATE" = "OK" ]
then
    ICON_EMOJI=":thumbsup:"
elif [ "$NAGIOS_SERVICESTATE" = "WARNING" ]
then
    ICON_EMOJI=":warning:"
elif [ "$NAGIOS_SERVICESTATE" = "CRITICAL" ]
then
    ICON_EMOJI=":error:"
elif [ "$NAGIOS_SERVICESTATE" = "UNKNOWN" ]
then
    ICON_EMOJI=":troll:"
else
    ICON_EMOJI=":octocat:"
fi


#request for posting to a channel
curl -X POST --data "payload={\"channel\": \"${SLACK_CHANNEL}\", \"username\": \"${SLACK_BOTNAME}\", \"icon_emoji\": \":vertical_traffic_light:\", \"text\": \"${ICON_EMOJI} HOST: ${NAGIOS_HOSTNAME}   SERVICE: ${NAGIOS_SERVICEDISPLAYNAME} STATE: ${NAGIOS_SERVICESTATE} MESSAGE: ${NAGIOS_SERVICEOUTPUT} <http://${WEBHOST_NAGIOS}/cgi-bin/nagios3/extinfo.cgi?host=${NAGIOS_HOSTNAME}|See Nagios>\"}" ${WEBHOOK_URL}
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: External command for notification won't work

Post by tgriep »

When a command is run by the Nagios daemon, it is run as the nagios user account and if the script's permissions are not set correctly, it may not run.
Login to the Nagios server as root, run the following commands to change the permissions of the script.

Code: Select all

chown nagios.nagios /usr/local/bin/slack_nagios.sh
chmod a+x /usr/local/bin/slack_nagios.sh
After that, test it out and see if the notification is sent.
Be sure to check out our Knowledgebase for helpful articles and solutions!
pilotmc
Posts: 21
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Post by pilotmc »

Thanks, tgriep. This was how the perms were. However, I re-issued those commands just to say I did.
Still not working:

Code: Select all

[1497420041] wproc:   host=host4.domain.com; service=Disk Space; contact=nagiosadmin
[1497420041] wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1497420041] Warning: Notifying contact 'nagiosadmin' of service 'Disk Space' on host 'host4.domain.com' by command '/usr/local/bin/slack_nagios.sh > /tmp/slack.log 2>&1' timed out after 0.00 seconds
One weird thing is that Nagios says it times out immediately, yet the slack.log has 30 seconds:

Code: Select all

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:29 --:--:--     0
The script relies on nagios env variables being set. I wonder if they're maybe not set when the script is called.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: External command for notification won't work

Post by tgriep »

If the Environment variables are not enabled, that is probably causing the issue.
Edit the nagios.cfg file and set the following option to 1 and restart the nagios daemon.

Code: Select all

enable_environment_macros=1
If that doesn't work, in the link you provided on the first post, has examples on how to setup the command in nagios to pass the macros directly without enabling the environment variables.
Be sure to check out our Knowledgebase for helpful articles and solutions!
pilotmc
Posts: 21
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Post by pilotmc »

So this was enabled and server restarted. Still not getting notification to my script.
I did see something in the log I don't understand...

Aug 1 05:57:01 monitor nagios: Nagios 4.3.2 starting... (PID=3755)
Aug 1 05:57:01 monitor nagios: Local time is Tue Aug 01 05:57:01 UTC 2017
Aug 1 05:57:01 monitor nagios: LOG VERSION: 2.0
Aug 1 05:57:01 monitor nagios: qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
Aug 1 05:57:01 monitor nagios: qh: core query handler registered
Aug 1 05:57:01 monitor nagios: nerd: Channel hostchecks registered successfully
Aug 1 05:57:01 monitor nagios: nerd: Channel servicechecks registered successfully
Aug 1 05:57:01 monitor nagios: nerd: Channel opathchecks registered successfully
Aug 1 05:57:01 monitor nagios: nerd: Fully initialized and ready to rock!
Aug 1 05:57:01 monitor nagios: wproc: Successfully registered manager as @wproc with query handler
Aug 1 05:57:01 monitor nagios: wproc: Registry request: name=Core Worker 3759;pid=3759
Aug 1 05:57:01 monitor nagios: wproc: Registry request: name=Core Worker 3758;pid=3758
Aug 1 05:57:01 monitor nagios: wproc: Registry request: name=Core Worker 3757;pid=3757
Aug 1 05:57:01 monitor nagios: wproc: Registry request: name=Core Worker 3756;pid=3756
Aug 1 05:57:07 monitor nagios: Successfully launched command file worker with pid 3765
Aug 1 05:59:04 monitor nagios: SERVICE ALERT: staff;Test;CRITICAL;HARD;1;(1) < YOWZA
Aug 1 05:59:04 monitor nagios: SERVICE NOTIFICATION: slack;staff;Test;CRITICAL;notify-service-by-slack;(1) < YOWZA
Aug 1 05:59:34 monitor nagios: job 2 (pid=3820): read() returned error 11
Aug 1 05:59:34 monitor nagios: wproc: Core Worker 3756: job 2 (pid=3820) timed out. Killing it
Aug 1 05:59:34 monitor nagios: wproc: NOTIFY job 2 from worker Core Worker 3756 timed out after 30.04s
Aug 1 05:59:34 monitor nagios: wproc: host=staff; service=Test; contact=slack
Aug 1 05:59:34 monitor nagios: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Aug 1 05:59:34 monitor nagios: Warning: Notifying contact 'slack' of service 'Test' on host 'staff' by command '/usr/local/bin/slack_nagios.sh "staff" "Test" "CRITICAL" "(1) YOWZA" "PROBLEM"' timed out after 0.00 seconds

This is for a test monitor that checks a logfile for the word YOWZA.
I can see that Nagios gets the alert. It emails it successfully to one of my contact's email addresses.
But, then when it tries to use my notification script, the worker times out with what appears to be a read error, but there is no indication of what was trying to be read.

Any ideas?
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: External command for notification won't work

Post by tgriep »

The
"Aug 1 05:59:34 monitor nagios: job 2 (pid=3820): read() returned error 11"
messages sounds like it is a logging bug.
https://github.com/NagiosEnterprises/na ... issues/172

The real error is
wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62
and it looks like Nagios tried to run the command but didn't work for some reason.

You may want to echo the command in the script to see if the variables get passed to the script.
Be sure to check out our Knowledgebase for helpful articles and solutions!
pilotmc
Posts: 21
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Post by pilotmc »

Thanks, tgriep.

Yes, I've tested the command from the command line and it works all the time. It's a curl call to Slack's API in order to send alerts to a specific channel.
I'm going to replace the script with a very simple one that just logs what's being passed in (again) to see if Nagios actually ever sends the command through nagios.cmd
pilotmc
Posts: 21
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Post by pilotmc »

Just as I thought... Nagios never even calls the external command on the notification... it just says "timed out after 0.00 seconds".
So, Nagios can't or won't send the notification

Any other ideas?
Locked