Page 1 of 1

Custom Plugin fails with "Service Check Timed Out"

Posted: Fri Apr 05, 2019 10:23 am
by amaclay
A custom plugin I've written for nagios core is failing with status Critical - (Service Check Timed Out). The plugin loads a shiny app and reports any errors it finds on the page. It runs and returns the appropriate exit code ("1" in the example below) in ~23 seconds when run as the nagios user from the command line. service_check_timeout is set to 60. When I enable the plugin in the nagios config, it shows this failure on the nagios service dashboard. Why is it timing out on the dash and not the command line?

Nagios Core Version 3.5.1
Shiny App Contents - Error Tracking CRITICAL 2019-04-05 11:09:03 3d 23h 17m 31s 4/4 (Service Check Timed Out)

Code: Select all

define command{
        command_name    check_shinycontents
        command_line    /usr/lib/nagios/plugins/check_appshot $ARGS1$
        }

Code: Select all

# Define a service to check for errors within shiny apps
define service{
        use                             long-interval-service
        host_name                       localhost
        service_description             Shiny App Contents - Error Tracking
        check_command                   check_shinycontents!error_tracking
        }

Code: Select all

nagios@hostname:~$ time /usr/lib/nagios/plugins/check_appshot error_tracking
WARNING - "invalid first argument"

real    0m23.564s
user    0m2.751s
sys     0m0.611s
nagios@hostname:~$ /usr/lib/nagios/plugins/check_appshot error_tracking
WARNING - "invalid first argument"
nagios@hostname:~$ echo $?
1

Code: Select all

user@hostname:~$ sudo grep -r timeout /etc/nagios3/
[i]/etc/nagios3/nagios.cfg:service_check_timeout=60
[/i]/etc/nagios3/nagios.cfg:host_check_timeout=30
/etc/nagios3/nagios.cfg:event_handler_timeout=30
/etc/nagios3/nagios.cfg:notification_timeout=30
/etc/nagios3/nagios.cfg:ocsp_timeout=5
/etc/nagios3/nagios.cfg:perfdata_timeout=5
/etc/nagios3/nagios.cfg:service_check_timeout_state=c

Re: Custom Plugin fails with "Service Check Timed Out"

Posted: Fri Apr 05, 2019 2:09 pm
by npolovenko
Hello, @amaclay. Is WARNING - "invalid first argument" the expected output from the service check? Can you upload your plugin in this thread?
Also, can you increase the timeout to 200 seconds, restart the Nagios process and let me know if that changes anything?

Re: Custom Plugin fails with "Service Check Timed Out"

Posted: Mon Apr 08, 2019 9:25 am
by amaclay
Thank you for your reply. WARNING - "invalid first argument" is the expected output. The plugin below opens a shiny app, waits 20 seconds for loading, then triggers an error and writes that error to a file. The plugin then reads that file, echos the error message, and exits with the specified code.

Increasing the timeout to 200s changes the plugin behavior. Instead of CRITICAL: (Service Check Timed Out), I now get WARNING: (null). It still runs all 4 attempts.
Current Status: WARNING (for 0d 0h 1m 36s)
Status Information: (null)
Performance Data:
Current Attempt: 4/4 (HARD state)
Last Check Time: 2019-04-08 10:05:07
Check Type: ACTIVE
Check Latency / Duration: 14.042 / 60.619 seconds
Next Scheduled Check: 2019-04-08 10:35:07
Last State Change: 2019-04-08 10:05:07
Last Notification: 2019-04-08 10:06:17 (notification 2)
Is This Service Flapping? NO (6.25% state change)
In Scheduled Downtime? NO
Last Update: 2019-04-08 10:06:37 ( 0d 0h 0m 6s ago )
Output file:
"Error Tracking",2019-04-08 09:49:52,1,"invalid first argument"
Custom plugin:

Code: Select all

#!/bin/bash

# Store target shiny app from command line param
program=$1
# Generate random string for unique error file
fileString=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 8 | head -n 1)
# Specify log file location and file name
logFile="/var/tmp/shiny-server/${program}_log_${fileString}"

cd /srv/shiny-server/development/$program

# Open shiny session for 20 seconds
appshotCall="library(webshot); appshot(getwd(), file = 'project_status.png', port = getOption('shiny.port'), envvars = c(T5 = 'Yes', logFile = '$logFile'), delay = 20)"
Rscript -e "$appshotCall"

# Read error file and process results
input=$logFile
while IFS=, read -r program ts error_code error_text
do
        if (( $error_code==0 )); then
                echo "OK - $error_text"
                exit 0
        elif (( $error_code==1 )); then
                echo "WARNING - $error_text"
                exit 1
        elif (( $error_code==2 )); then
                echo "CRITICAL - $error_text"
                exit 2
        else
                echo "UNKNOWN - $error_text"
                exit 3
        fi
done < "$input"

rm $logFile

Re: Custom Plugin fails with "Service Check Timed Out"

Posted: Mon Apr 08, 2019 11:28 am
by mcapra
Does the nagios user have perms to run Rscript? Also to import the webshot library? A pretty common gotcha with R is that the user is completely unable to bring in certain libraries to the R runtime due to perms errors.

I would suggest hard-coding the path to your Rscript binary, as Nagios Core does not execute plugins with a particular shell and evaluations of commands can sometimes get lost.

Try su to nagios, run your script, and share the output. That should help identify permissions related concerns, if they exist.

Re: Custom Plugin fails with "Service Check Timed Out"

Posted: Mon Apr 08, 2019 12:19 pm
by amaclay
The nagios user appears to be able to run the plugin from command line.

Code: Select all

nagios@hostname:~$ /usr/lib/nagios/plugins/check_appshot error_tracking
WARNING - "invalid first argument"
nagios@hostname:~$ time /usr/lib/nagios/plugins/check_appshot error_tracking
WARNING - "invalid first argument"

real    0m22.436s
user    0m2.736s
sys     0m0.693s
The R code also behaves as expected when I run the command explicitly in R from the command line as the nagios user.

Re: Custom Plugin fails with "Service Check Timed Out"

Posted: Mon Apr 08, 2019 1:15 pm
by npolovenko
@amaclay, I think I found the issue in this block:
define command{
command_name check_shinycontents
command_line /usr/lib/nagios/plugins/check_appshot $ARGS1$
}
The macro for the argument is called $ARG1$, not $ARGS1$. Please change the command to:
define command{
command_name check_shinycontents
command_line /usr/lib/nagios/plugins/check_appshot $ARG1$
}
Restart the Nagios process and let me know if this fixes the issue.

Re: Custom Plugin fails with "Service Check Timed Out"

Posted: Mon Apr 08, 2019 1:56 pm
by amaclay
Oh wow. That would certainly do it, now it works perfectly. Thank you!

Re: Custom Plugin fails with "Service Check Timed Out"

Posted: Mon Apr 08, 2019 2:34 pm
by npolovenko
@amaclay, No problem! ;) I'll close this thread as resolved but feel free to open a new one if anything else comes up.