SOLVED...External command for notification won't work

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

SOLVED...External command for notification won't work

Postby pilotmc » Tue Jun 13, 2017 5:52 pm

Hello.

I have an external command script for Slack notification which had been working fine but stopped a couple months ago. I believe it stopped right after doing a Debian upgrade.
I'm also unable to execute commands from the GUI due to: "Error: Could not open command file '/usr/local/nagios/var/rw/nagios.cmd' for update!".

Steps:

1) Install Nagios Core 4.2.3 and plugins on Debian 8.8 following this guide: https://support.nagios.com/kb/article/n ... tml#Debian
2) Set up a host, a slack user, a notification command and configure them according to: http://matthewcmcmillan.blogspot.com/20 ... -with.html (except without the slack shell script naming error on that post).
3) Watched the logs and saw errors and got emails (email notification was enabled as well), but no slack messages.
4) Running the slack script manually DOES send a notification to Slack.

So, something is broken somewhere and I can't figure out what.

Let me know what info is needed, as I don't want to pollute the board with endless gobs of data from config files that may be inconsequential.
Thanks
Last edited by pilotmc on Thu Aug 03, 2017 1:34 pm, edited 1 time in total.
pilotmc
 
Posts: 20
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Postby pilotmc » Tue Jun 13, 2017 9:46 pm

From nagios.log:

[1497407589] Warning: Notifying contact 'nagiosadmin' of service 'Load' on host 'newlive4.playnet.com' by command '/usr/local/bin/slack_nagios.sh > /tmp/slack.log 2>&1' timed out after 0.00 seconds
pilotmc
 
Posts: 20
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Postby pilotmc » Tue Jun 13, 2017 9:53 pm

contacts.cfg:

Code: Select all
define contact{
    contact_name                nagiosadmin
    use                         generic-contact
    alias                       Nagios Admin
    service_notification_commands   notify-service-by-slack
    host_notification_commands     notify-host-by-slack
    email                       nagios@localhost
        }


/usr/local/bin/slack_nagios.sh:

Code: Select all
#!/bin/bash

WEBHOST_NAGIOS="monitor.mydomain.com"
SLACK_CHANNEL="#alerts"
SLACK_BOTNAME="Nagios"
WEBHOOK_URL="https://hooks.slack.com/services/********/**********/********************"

#Set the message icon based on Nagios service state
if [ "$NAGIOS_SERVICESTATE" = "OK" ]
then
    ICON_EMOJI=":thumbsup:"
elif [ "$NAGIOS_SERVICESTATE" = "WARNING" ]
then
    ICON_EMOJI=":warning:"
elif [ "$NAGIOS_SERVICESTATE" = "CRITICAL" ]
then
    ICON_EMOJI=":error:"
elif [ "$NAGIOS_SERVICESTATE" = "UNKNOWN" ]
then
    ICON_EMOJI=":troll:"
else
    ICON_EMOJI=":octocat:"
fi


#request for posting to a channel
curl -X POST --data "payload={\"channel\": \"${SLACK_CHANNEL}\", \"username\": \"${SLACK_BOTNAME}\", \"icon_emoji\": \":vertical_traffic_light:\", \"text\": \"${ICON_EMOJI} HOST: ${NAGIOS_HOSTNAME}   SERVICE: ${NAGIOS_SERVICEDISPLAYNAME} STATE: ${NAGIOS_SERVICESTATE} MESSAGE: ${NAGIOS_SERVICEOUTPUT} <http://${WEBHOST_NAGIOS}/cgi-bin/nagios3/extinfo.cgi?host=${NAGIOS_HOSTNAME}|See Nagios>\"}" ${WEBHOOK_URL}
pilotmc
 
Posts: 20
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Postby tgriep » Wed Jun 14, 2017 10:26 am

When a command is run by the Nagios daemon, it is run as the nagios user account and if the script's permissions are not set correctly, it may not run.
Login to the Nagios server as root, run the following commands to change the permissions of the script.
Code: Select all
chown nagios.nagios /usr/local/bin/slack_nagios.sh
chmod a+x /usr/local/bin/slack_nagios.sh


After that, test it out and see if the notification is sent.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 6019
Joined: Thu Oct 30, 2014 9:02 am

Re: External command for notification won't work

Postby pilotmc » Wed Jun 14, 2017 2:12 pm

Thanks, tgriep. This was how the perms were. However, I re-issued those commands just to say I did.
Still not working:

Code: Select all
[1497420041] wproc:   host=host4.domain.com; service=Disk Space; contact=nagiosadmin
[1497420041] wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1497420041] Warning: Notifying contact 'nagiosadmin' of service 'Disk Space' on host 'host4.domain.com' by command '/usr/local/bin/slack_nagios.sh > /tmp/slack.log 2>&1' timed out after 0.00 seconds


One weird thing is that Nagios says it times out immediately, yet the slack.log has 30 seconds:

Code: Select all
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:29 --:--:--     0


The script relies on nagios env variables being set. I wonder if they're maybe not set when the script is called.
pilotmc
 
Posts: 20
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Postby tgriep » Thu Jun 15, 2017 11:20 am

If the Environment variables are not enabled, that is probably causing the issue.
Edit the nagios.cfg file and set the following option to 1 and restart the nagios daemon.
Code: Select all
enable_environment_macros=1


If that doesn't work, in the link you provided on the first post, has examples on how to setup the command in nagios to pass the macros directly without enabling the environment variables.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 6019
Joined: Thu Oct 30, 2014 9:02 am

Re: External command for notification won't work

Postby pilotmc » Tue Aug 01, 2017 1:10 am

So this was enabled and server restarted. Still not getting notification to my script.
I did see something in the log I don't understand...

Aug 1 05:57:01 monitor nagios: Nagios 4.3.2 starting... (PID=3755)
Aug 1 05:57:01 monitor nagios: Local time is Tue Aug 01 05:57:01 UTC 2017
Aug 1 05:57:01 monitor nagios: LOG VERSION: 2.0
Aug 1 05:57:01 monitor nagios: qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
Aug 1 05:57:01 monitor nagios: qh: core query handler registered
Aug 1 05:57:01 monitor nagios: nerd: Channel hostchecks registered successfully
Aug 1 05:57:01 monitor nagios: nerd: Channel servicechecks registered successfully
Aug 1 05:57:01 monitor nagios: nerd: Channel opathchecks registered successfully
Aug 1 05:57:01 monitor nagios: nerd: Fully initialized and ready to rock!
Aug 1 05:57:01 monitor nagios: wproc: Successfully registered manager as @wproc with query handler
Aug 1 05:57:01 monitor nagios: wproc: Registry request: name=Core Worker 3759;pid=3759
Aug 1 05:57:01 monitor nagios: wproc: Registry request: name=Core Worker 3758;pid=3758
Aug 1 05:57:01 monitor nagios: wproc: Registry request: name=Core Worker 3757;pid=3757
Aug 1 05:57:01 monitor nagios: wproc: Registry request: name=Core Worker 3756;pid=3756
Aug 1 05:57:07 monitor nagios: Successfully launched command file worker with pid 3765
Aug 1 05:59:04 monitor nagios: SERVICE ALERT: staff;Test;CRITICAL;HARD;1;(1) < YOWZA
Aug 1 05:59:04 monitor nagios: SERVICE NOTIFICATION: slack;staff;Test;CRITICAL;notify-service-by-slack;(1) < YOWZA
Aug 1 05:59:34 monitor nagios: job 2 (pid=3820): read() returned error 11
Aug 1 05:59:34 monitor nagios: wproc: Core Worker 3756: job 2 (pid=3820) timed out. Killing it
Aug 1 05:59:34 monitor nagios: wproc: NOTIFY job 2 from worker Core Worker 3756 timed out after 30.04s
Aug 1 05:59:34 monitor nagios: wproc: host=staff; service=Test; contact=slack
Aug 1 05:59:34 monitor nagios: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Aug 1 05:59:34 monitor nagios: Warning: Notifying contact 'slack' of service 'Test' on host 'staff' by command '/usr/local/bin/slack_nagios.sh "staff" "Test" "CRITICAL" "(1) YOWZA" "PROBLEM"' timed out after 0.00 seconds

This is for a test monitor that checks a logfile for the word YOWZA.
I can see that Nagios gets the alert. It emails it successfully to one of my contact's email addresses.
But, then when it tries to use my notification script, the worker times out with what appears to be a read error, but there is no indication of what was trying to be read.

Any ideas?
pilotmc
 
Posts: 20
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Postby tgriep » Tue Aug 01, 2017 9:10 am

The
"Aug 1 05:59:34 monitor nagios: job 2 (pid=3820): read() returned error 11"
messages sounds like it is a logging bug.
https://github.com/NagiosEnterprises/na ... issues/172

The real error is
wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62

and it looks like Nagios tried to run the command but didn't work for some reason.

You may want to echo the command in the script to see if the variables get passed to the script.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 6019
Joined: Thu Oct 30, 2014 9:02 am

Re: External command for notification won't work

Postby pilotmc » Tue Aug 01, 2017 9:15 am

Thanks, tgriep.

Yes, I've tested the command from the command line and it works all the time. It's a curl call to Slack's API in order to send alerts to a specific channel.
I'm going to replace the script with a very simple one that just logs what's being passed in (again) to see if Nagios actually ever sends the command through nagios.cmd
pilotmc
 
Posts: 20
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Postby pilotmc » Tue Aug 01, 2017 10:03 am

Just as I thought... Nagios never even calls the external command on the notification... it just says "timed out after 0.00 seconds".
So, Nagios can't or won't send the notification

Any other ideas?
pilotmc
 
Posts: 20
Joined: Tue May 23, 2017 3:33 pm

Next

Return to Nagios Core

Who is online

Users browsing this forum: Google [Bot] and 13 guests