Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
A custom plugin I've written for nagios core is failing with status Critical - (Service Check Timed Out). The plugin loads a shiny app and reports any errors it finds on the page. It runs and returns the appropriate exit code ("1" in the example below) in ~23 seconds when run as the nagios user from the command line. service_check_timeout is set to 60. When I enable the plugin in the nagios config, it shows this failure on the nagios service dashboard. Why is it timing out on the dash and not the command line?
# Define a service to check for errors within shiny apps
define service{
use long-interval-service
host_name localhost
service_description Shiny App Contents - Error Tracking
check_command check_shinycontents!error_tracking
}
Hello, @amaclay. Is WARNING - "invalid first argument" the expected output from the service check? Can you upload your plugin in this thread?
Also, can you increase the timeout to 200 seconds, restart the Nagios process and let me know if that changes anything?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Thank you for your reply. WARNING - "invalid first argument" is the expected output. The plugin below opens a shiny app, waits 20 seconds for loading, then triggers an error and writes that error to a file. The plugin then reads that file, echos the error message, and exits with the specified code.
Increasing the timeout to 200s changes the plugin behavior. Instead of CRITICAL: (Service Check Timed Out), I now get WARNING: (null). It still runs all 4 attempts.
Current Status: WARNING (for 0d 0h 1m 36s)
Status Information: (null)
Performance Data:
Current Attempt: 4/4 (HARD state)
Last Check Time: 2019-04-08 10:05:07
Check Type: ACTIVE
Check Latency / Duration: 14.042 / 60.619 seconds
Next Scheduled Check: 2019-04-08 10:35:07
Last State Change: 2019-04-08 10:05:07
Last Notification: 2019-04-08 10:06:17 (notification 2)
Is This Service Flapping? NO (6.25% state change)
In Scheduled Downtime? NO
Last Update: 2019-04-08 10:06:37 ( 0d 0h 0m 6s ago )
Output file:
"Error Tracking",2019-04-08 09:49:52,1,"invalid first argument"
Does the nagios user have perms to run Rscript? Also to import the webshot library? A pretty common gotcha with R is that the user is completely unable to bring in certain libraries to the R runtime due to perms errors.
I would suggest hard-coding the path to your Rscript binary, as Nagios Core does not execute plugins with a particular shell and evaluations of commands can sometimes get lost.
Try su to nagios, run your script, and share the output. That should help identify permissions related concerns, if they exist.
nagios@hostname:~$ /usr/lib/nagios/plugins/check_appshot error_tracking
WARNING - "invalid first argument"
nagios@hostname:~$ time /usr/lib/nagios/plugins/check_appshot error_tracking
WARNING - "invalid first argument"
real 0m22.436s
user 0m2.736s
sys 0m0.693s
The R code also behaves as expected when I run the command explicitly in R from the command line as the nagios user.