SOLVED...External command for notification won't work

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: External command for notification won't work

Post by tgriep »

Can you post how you test the command from the command line so I can see the steps you are doing to run it?
Then, run the following commands and post the output.

Code: Select all

su - nagios
ls -l /tmp
ls -l /usr/local/bin
/usr/local/bin/slack_nagios.sh
Thanks
Be sure to check out our Knowledgebase for helpful articles and solutions!
pilotmc
Posts: 21
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Post by pilotmc »

Nagios says the command ran in nagios.debug, but timed out in nagios.log. I still never saw the alert get to Slack:

nagios.debug:
[1501609307.790615] [032.2] [pid=27309] ** Notifying contact 'slack'
[1501609307.790623] [032.2] [pid=27309] Raw notification command: /usr/bin/php /usr/local/bin/slack_nagios.php $HOSTNAME$ $SERVICEDESC$ $SERVICESTATE$ "$SERVICEOUTPUT$" $NOTIFICATIONTYPE$
[1501609307.790633] [032.2] [pid=27309] Processed notification command: /usr/bin/php /usr/local/bin/slack_nagios.php staff Test OK "Log check ok - 0 pattern matches found" RECOVERY
[1501609307.790755] [032.2] [pid=27309] Calculating next valid notification time...
[1501609307.790761] [032.2] [pid=27309] Default interval: 10.000000
[1501609307.790767] [032.2] [pid=27309] Interval used for calculating next valid notification time: 10.000000
[1501609307.790776] [032.0] [pid=27309] 3 contacts were notified. Next possible notification time: Tue Aug 1 17:51:47 2017
[1501609307.790781] [032.0] [pid=27309] 3 contacts were notified.

nagios.log:
[1501609307] SERVICE ALERT: staff;Test;OK;HARD;1;Log check ok - 0 pattern matches found
[1501609307] SERVICE NOTIFICATION: pilotmc;staff;Test;OK;notify-service-by-email;Log check ok - 0 pattern matches found
[1501609307] SERVICE NOTIFICATION: gadget;staff;Test;OK;notify-service-by-email;Log check ok - 0 pattern matches found
[1501609307] SERVICE NOTIFICATION: slack;staff;Test;OK;notify-service-by-slack;Log check ok - 0 pattern matches found
[1501609337] wproc: Core Worker 27313: job 4 (pid=27375) timed out. Killing it
[1501609337] wproc: NOTIFY job 4 from worker Core Worker 27313 timed out after 30.03s
[1501609337] wproc: host=staff; service=Test; contact=slack
[1501609337] wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1501609337] Warning: Notifying contact 'slack' of service 'Test' on host 'staff' by command '/usr/bin/php /usr/local/bin/slack_nagios.php staff Test OK "Log check ok - 0 pattern matches found" RECOVERY' timed out after 0.00 seconds
[1501609337] wproc: Core Worker 27313: job 4 (pid=27375): Dormant child reaped

So, on one hand, it looks like it was processed fine by nagios and sent (which is wasn't). In nagios log we see another PID handling the message time out.
If I take the "Processed notification command" and run it exactly from the command line, the alert shows up in Slack as it should.
my email checks both arrive correctly.

Does the notification service expect some sort of return value from the scripts?
I'm stymied. Any input would be greatly appreciated.
pilotmc
Posts: 21
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Post by pilotmc »

tgriep wrote:Can you post how you test the command from the command line so I can see the steps you are doing to run it?
Then, run the following commands and post the output.

Code: Select all

su - nagios
ls -l /tmp
ls -l /usr/local/bin
/usr/local/bin/slack_nagios.sh
Thanks
Thanks, tgriep...

Code: Select all

root@monitor:/usr/local/nagios/var# su - nagios
nagios@monitor:~$ ls -l /tmp
total 2520
drwxrwxr-x 17 root    root      4096 Jun 12 05:47 nagios-plugins-release-2.2.1
-rw-r--r--  1 root    root   2049050 Jun 12 05:44 nagios-plugins.tar.gz
drwxrwxr-x 10 root    root      4096 Jun 12 05:52 nrpe-nrpe-3.1.1
-rw-r--r--  1 root    root    515243 Jun 12 05:49 nrpe.tar.gz
-rwxrwxrwx  1 nagios  nagios       0 Aug  1 05:56 slack.log
-rw-rw-rw-  1 root    root         0 Aug  1 17:21 slackapi.log
drwx------  2 pilotmc staff     4096 Aug  1 16:03 ssh-CW1Ff9uoTk
nagios@monitor:~$ ls -l /usr/local/bin
total 20
-rw-r--r-- 1 nagios  nagios 1441 Aug  1 17:32 slack_nagios.php
-rwxr-xr-x 1 root    staff  1654 May 23 21:00 slack_nagios.sh-ORIG
-rwxr-xr-x 1 nagios  nagios 1455 Jun 22 21:35 slack_nagios2.sh
-rwxr-xr-x 1 root    staff    78 Jun 22 18:06 test.php
-rwxr-xr-x 1 pilotmc staff   293 May 23 05:55 test_alert.sh
nagios@monitor:~$ /usr/bin/php /usr/local/bin/slack_nagios.php test test OK test test
nagios@monitor:~$ 
This then generates the alert properly in Slack. I've remove the logging to /tmp part in order to "thin down" the script to the bare minimums.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: External command for notification won't work

Post by tgriep »

It looks like you are not using the script anymore and are using a PHP file now so that changes things.
It looks like Nagios is running it but I would guess that the command that is defined it not passing the macros to the PHP script and that is why the notification is not getting sent.
Verify that the command it setup and test it again.
Be sure to check out our Knowledgebase for helpful articles and solutions!
pilotmc
Posts: 21
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Post by pilotmc »

Yes, I'm trying another language in case the perl is somehow the problem (it isn't...it's always worked in the past...it just stopped working at some point).

Here are the command configs:

Code: Select all

# 'notify-service-by-slack' command definition
define command {
	command_name	notify-service-by-slack
	command_line	/usr/bin/php /usr/local/bin/slack_nagios.php $HOSTNAME$ $SERVICEDESC$ $SERVICESTATE$ "$SERVICEOUTPUT$" $NOTIFICATIONTYPE$
}

# 'notify-host-by-slack' command definition
define command {
	command_name	notify-host-by-slack
	command_line	/usr/bin/php /usr/local/bin/slack_nagios.php $HOSTNAME$ $SERVICEDESC$ $SERVICESTATE$ "$SERVICEOUTPUT$" $NOTIFICATIONTYPE$
}
So, I'm passing the values in as parameters to the script, just as the example email command does.

Nagios processes the command fine:

Code: Select all

[1501694687.405609] [032.2] [pid=27309] ** Notifying contact 'slack'
[1501694687.405619] [032.2] [pid=27309] Raw notification command: /usr/bin/php /usr/local/bin/slack_nagios.php $HOSTNAME$ $SERVICEDESC$ $SERVICESTATE$ "$SERVICEOUTPUT$" $NOTIFICATIONTYPE$
[1501694687.405646] [032.2] [pid=27309] Processed notification command: /usr/bin/php /usr/local/bin/slack_nagios.php staff Test CRITICAL "(1)  YOWZA" PROBLEM
[1501694687.405774] [032.2] [pid=27309] Calculating next valid notification time...
[1501694687.405782] [032.2] [pid=27309] Default interval: 10.000000
[1501694687.405791] [032.2] [pid=27309] Interval used for calculating next valid notification time: 10.000000
[1501694687.405802] [032.0] [pid=27309] 3 contacts were notified.  Next possible notification time: Wed Aug  2 17:34:47 2017
[1501694687.405808] [032.0] [pid=27309] 3 contacts were notified.
At this point, I can copy/paste the "processed" notification command into the terminal and it works fine...all the time. Each time.
But in the log, shows the timeout after 30 seconds:

Code: Select all

[1501694687] SERVICE ALERT: staff;Test;CRITICAL;HARD;1;(1) < YOWZA
[1501694687] SERVICE NOTIFICATION: opsteam;staff;Test;CRITICAL;notify-service-by-email;(1) < YOWZA
[1501694687] SERVICE NOTIFICATION: admins;staff;Test;CRITICAL;notify-service-by-email;(1) < YOWZA
[1501694687] SERVICE NOTIFICATION: slack;staff;Test;CRITICAL;notify-service-by-slack;(1) < YOWZA
[1501694717] wproc: Core Worker 27312: job 2242 (pid=6403) timed out. Killing it
[1501694717] wproc: NOTIFY job 2242 from worker Core Worker 27312 timed out after 30.03s
[1501694717] wproc:   host=staff; service=Test; contact=slack
[1501694717] wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1501694717] Warning: Notifying contact 'slack' of service 'Test' on host 'staff' by command '/usr/bin/php /usr/local/bin/slack_nagios.php staff Test CRITICAL "(1)  YOWZA" PROBLEM' timed out after 0.00 seconds
[1501694717] wproc: Core Worker 27312: job 2242 (pid=6403): Dormant child reaped
[1501694747] SERVICE ALERT: staff;Test;OK;HARD;1;Log check ok - 0 pattern matches found
[1501694747] SERVICE NOTIFICATION: opsteam;staff;Test;OK;notify-service-by-email;Log check ok - 0 pattern matches found
[1501694747] SERVICE NOTIFICATION: admins;staff;Test;OK;notify-service-by-email;Log check ok - 0 pattern matches found
[1501694747] SERVICE NOTIFICATION: slack;staff;Test;OK;notify-service-by-slack;Log check ok - 0 pattern matches found
[1501694777] wproc: Core Worker 27312: job 2246 (pid=6494) timed out. Killing it
[1501694777] wproc: NOTIFY job 2246 from worker Core Worker 27312 timed out after 30.03s
[1501694777] wproc:   host=staff; service=Test; contact=slack
[1501694777] wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
[1501694777] Warning: Notifying contact 'slack' of service 'Test' on host 'staff' by command '/usr/bin/php /usr/local/bin/slack_nagios.php staff Test OK "Log check ok - 0 pattern matches found" RECOVERY' timed out after 0.00 seconds
[1501694777] wproc: Core Worker 27312: job 2246 (pid=6494): Dormant child reaped
This also shows the recovery message timing out as well. Both times exceeding the 30 second timeout. It shouldn't take 30 seconds to issue a command line command. What's blocking Nagios from doing so?
Smells like a Nagios bug to me.
Maybe next step is to open a bug report.
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: External command for notification won't work

Post by tgriep »

It looks like the nagios process it running the script and and passing the macros to the script but the script is not executing it.
Where did you get the PHP script from?
Does it rely on any environment variables to be set to run?
Can you post it here so we can view it?
Be sure to check out our Knowledgebase for helpful articles and solutions!
pilotmc
Posts: 21
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Post by pilotmc »

Thanks, tgriep.

I wrote it.
It does not rely on any ENV scripts, just what is passed in from Nagios through the command.

Code: Select all

<?php
	// Nagios-Slack Interface
	//
	//	Sends alerts to Slack using the Slack API
	//
	
	/* Variables from the notification command:
	NAGIOS_HOSTNAME=$1
	NAGIOS_SERVICEDISPLAYNAME=$2
	NAGIOS_SERVICESTATE=$3
	NAGIOS_SERVICEOUTPUT=$4
	NAGIOS_NOTIFICATIONYTYPE=$5
	*/
	
	$server="monitor.mydomain.com";
	$token="*******/*******/************************";
	$channel="#alerts";
	$botname="Nagios";
	
	if( $argv[3] == "OK" )
		$icon = ":white_check_mark:";
	else if( $argv[3] == "WARNING" )
		$icon = ":warning:";
	else if( $argv[3] == "CRITICAL" )
		$icon = ":error:";
	else
		$icon = ":grey_question:";
	
	$payload = array();

	$payload['channel'] = $channel;
	$payload['username'] = $botname;
	$payload['icon_emoji'] = $icon;
	$payload['text'] = sprintf( "HOST: %s\nSERVICE: %s\nMESSAGE: %s. See <https://%s/nagios/cgi-bin/extinfo.cgi?type=1&host=%s|See Nagios>", $argv[1], $argv[2], $argv[4], $server, $argv[1] );
	
	$post = json_encode($payload, JSON_UNESCAPED_SLASHES );
	
	$ch = curl_init("https://hooks.slack.com/services/$token");
	curl_setopt( $ch, CURLOPT_POSTFIELDS, $post );
	curl_setopt( $ch, CURLOPT_POST, 1 );
	curl_setopt( $ch, CURLOPT_HTTPHEADER, array("Content-type: application/json") );
	curl_setopt( $ch, CURLOPT_SSL_VERIFYHOST, 0 );
	curl_setopt( $ch, CURLOPT_SSL_VERIFYPEER, 0 );
	curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );
	
	$result = curl_exec( $ch );
	syslog( LOG_INFO, "slack-nagios: {$payload['text']}"); 
?>
As mentioned earlier, if you take the processed command and paste it into the server, it works fine and generates the appropriate message in Slack.

thanks,
Mike
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: External command for notification won't work

Post by dwhitfield »

pilotmc wrote:it just stopped working at some point
Do you have the historical logs to grep through? Knowing when the issue started could help tie it to an upgrade or permissions issue.

We're not really set up to debug custom/third-party programming, but obviously we can try to help.
pilotmc
Posts: 21
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Post by pilotmc »

No, I don't.
What can we do moving forward? How can I get you guys the information you need to help get this resolved?
pilotmc
Posts: 21
Joined: Tue May 23, 2017 3:33 pm

Re: External command for notification won't work

Post by pilotmc »

I found the issue. This host was reconfigured to use a proxy, and though it was set in /etc/profile.d/ and /etc/wgetrc, the nagios worker threads never got wind of it, and therefore they WERE actually timing out waiting on the connection (which never got through).

So, word to the wise... check for proxy and adjust your scripts to include the proxy definition...it may not get loaded from the typical spots you'd set it on your system!

Thanks for all the help, dwhitfield and tgriep. :)
Locked