Custom Plugin fails with "Service Check Timed Out"

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Custom Plugin fails with "Service Check Timed Out"

Postby amaclay » Fri Apr 05, 2019 10:23 am

A custom plugin I've written for nagios core is failing with status Critical - (Service Check Timed Out). The plugin loads a shiny app and reports any errors it finds on the page. It runs and returns the appropriate exit code ("1" in the example below) in ~23 seconds when run as the nagios user from the command line. service_check_timeout is set to 60. When I enable the plugin in the nagios config, it shows this failure on the nagios service dashboard. Why is it timing out on the dash and not the command line?

Nagios Core Version 3.5.1

Shiny App Contents - Error Tracking CRITICAL 2019-04-05 11:09:03 3d 23h 17m 31s 4/4 (Service Check Timed Out)


Code: Select all
define command{
        command_name    check_shinycontents
        command_line    /usr/lib/nagios/plugins/check_appshot $ARGS1$
        }


Code: Select all
# Define a service to check for errors within shiny apps
define service{
        use                             long-interval-service
        host_name                       localhost
        service_description             Shiny App Contents - Error Tracking
        check_command                   check_shinycontents!error_tracking
        }


Code: Select all
nagios@hostname:~$ time /usr/lib/nagios/plugins/check_appshot error_tracking
WARNING - "invalid first argument"

real    0m23.564s
user    0m2.751s
sys     0m0.611s
nagios@hostname:~$ /usr/lib/nagios/plugins/check_appshot error_tracking
WARNING - "invalid first argument"
nagios@hostname:~$ echo $?
1


Code: Select all
user@hostname:~$ sudo grep -r timeout /etc/nagios3/
[i]/etc/nagios3/nagios.cfg:service_check_timeout=60
[/i]/etc/nagios3/nagios.cfg:host_check_timeout=30
/etc/nagios3/nagios.cfg:event_handler_timeout=30
/etc/nagios3/nagios.cfg:notification_timeout=30
/etc/nagios3/nagios.cfg:ocsp_timeout=5
/etc/nagios3/nagios.cfg:perfdata_timeout=5
/etc/nagios3/nagios.cfg:service_check_timeout_state=c
amaclay
 
Posts: 4
Joined: Mon Apr 01, 2019 11:36 am

Re: Custom Plugin fails with "Service Check Timed Out"

Postby npolovenko » Fri Apr 05, 2019 2:09 pm

Hello, @amaclay. Is WARNING - "invalid first argument" the expected output from the service check? Can you upload your plugin in this thread?
Also, can you increase the timeout to 200 seconds, restart the Nagios process and let me know if that changes anything?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
npolovenko
Support Tech
 
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Custom Plugin fails with "Service Check Timed Out"

Postby amaclay » Mon Apr 08, 2019 9:25 am

Thank you for your reply. WARNING - "invalid first argument" is the expected output. The plugin below opens a shiny app, waits 20 seconds for loading, then triggers an error and writes that error to a file. The plugin then reads that file, echos the error message, and exits with the specified code.

Increasing the timeout to 200s changes the plugin behavior. Instead of CRITICAL: (Service Check Timed Out), I now get WARNING: (null). It still runs all 4 attempts.
Current Status: WARNING (for 0d 0h 1m 36s)
Status Information: (null)
Performance Data:
Current Attempt: 4/4 (HARD state)
Last Check Time: 2019-04-08 10:05:07
Check Type: ACTIVE
Check Latency / Duration: 14.042 / 60.619 seconds
Next Scheduled Check: 2019-04-08 10:35:07
Last State Change: 2019-04-08 10:05:07
Last Notification: 2019-04-08 10:06:17 (notification 2)
Is This Service Flapping? NO (6.25% state change)
In Scheduled Downtime? NO
Last Update: 2019-04-08 10:06:37 ( 0d 0h 0m 6s ago )


Output file:
"Error Tracking",2019-04-08 09:49:52,1,"invalid first argument"


Custom plugin:
Code: Select all
#!/bin/bash

# Store target shiny app from command line param
program=$1
# Generate random string for unique error file
fileString=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 8 | head -n 1)
# Specify log file location and file name
logFile="/var/tmp/shiny-server/${program}_log_${fileString}"

cd /srv/shiny-server/development/$program

# Open shiny session for 20 seconds
appshotCall="library(webshot); appshot(getwd(), file = 'project_status.png', port = getOption('shiny.port'), envvars = c(T5 = 'Yes', logFile = '$logFile'), delay = 20)"
Rscript -e "$appshotCall"

# Read error file and process results
input=$logFile
while IFS=, read -r program ts error_code error_text
do
        if (( $error_code==0 )); then
                echo "OK - $error_text"
                exit 0
        elif (( $error_code==1 )); then
                echo "WARNING - $error_text"
                exit 1
        elif (( $error_code==2 )); then
                echo "CRITICAL - $error_text"
                exit 2
        else
                echo "UNKNOWN - $error_text"
                exit 3
        fi
done < "$input"

rm $logFile
amaclay
 
Posts: 4
Joined: Mon Apr 01, 2019 11:36 am

Re: Custom Plugin fails with "Service Check Timed Out"

Postby mcapra » Mon Apr 08, 2019 11:28 am

Does the nagios user have perms to run Rscript? Also to import the webshot library? A pretty common gotcha with R is that the user is completely unable to bring in certain libraries to the R runtime due to perms errors.

I would suggest hard-coding the path to your Rscript binary, as Nagios Core does not execute plugins with a particular shell and evaluations of commands can sometimes get lost.

Try su to nagios, run your script, and share the output. That should help identify permissions related concerns, if they exist.
Former Nagios employee
http://www.mcapra.com/
User avatar
mcapra
 
Posts: 3560
Joined: Thu May 05, 2016 3:54 pm

Re: Custom Plugin fails with "Service Check Timed Out"

Postby amaclay » Mon Apr 08, 2019 12:19 pm

The nagios user appears to be able to run the plugin from command line.

Code: Select all
nagios@hostname:~$ /usr/lib/nagios/plugins/check_appshot error_tracking
WARNING - "invalid first argument"
nagios@hostname:~$ time /usr/lib/nagios/plugins/check_appshot error_tracking
WARNING - "invalid first argument"

real    0m22.436s
user    0m2.736s
sys     0m0.693s


The R code also behaves as expected when I run the command explicitly in R from the command line as the nagios user.
amaclay
 
Posts: 4
Joined: Mon Apr 01, 2019 11:36 am

Re: Custom Plugin fails with "Service Check Timed Out"

Postby npolovenko » Mon Apr 08, 2019 1:15 pm

@amaclay, I think I found the issue in this block:
define command{
command_name check_shinycontents
command_line /usr/lib/nagios/plugins/check_appshot $ARGS1$
}


The macro for the argument is called $ARG1$, not $ARGS1$. Please change the command to:
define command{
command_name check_shinycontents
command_line /usr/lib/nagios/plugins/check_appshot $ARG1$
}


Restart the Nagios process and let me know if this fixes the issue.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
npolovenko
Support Tech
 
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Custom Plugin fails with "Service Check Timed Out"

Postby amaclay » Mon Apr 08, 2019 1:56 pm

Oh wow. That would certainly do it, now it works perfectly. Thank you!
amaclay
 
Posts: 4
Joined: Mon Apr 01, 2019 11:36 am

Re: Custom Plugin fails with "Service Check Timed Out"

Postby npolovenko » Mon Apr 08, 2019 2:34 pm

@amaclay, No problem! ;) I'll close this thread as resolved but feel free to open a new one if anything else comes up.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
npolovenko
Support Tech
 
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm


Return to Nagios Core

Who is online

Users browsing this forum: Fess [Nagios Bot] and 12 guests