check_log pluggin getting Service Check Timed Out

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
bosco
Posts: 40
Joined: Thu Nov 24, 2016 5:34 am

check_log pluggin getting Service Check Timed Out

Post by bosco »

We are using check_log plugin for monitoring the logs fo the servers.log monitoring is working fine but some times getting Service Check Timed Out and flagging as false alert in nagios. could you please share a solution to fix this.

eg: PROBLEM: Log file error detected on XXX server XXX .LogcheckService is CRITICAL.
Additional info: (Service Check Timed Out)
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: check_log pluggin getting Service Check Timed Out

Post by mcapra »

"Service check timed out after x seconds" means the plugin's execution exceeded the Nagios configuration's service_check_timeout setting, so Nagios stopped it.

Is the log file particularly large when the plugin experiences a timeout? I can't imagine diff or grep (which is what check_log uses) having a particularly tough time unless the log files are quite large.

How are you executing this plugin? Are you using NRPE? SSH? NCPA? You might try running check_log by hand from the CLI of the machine carrying the logs to see if the plugin is running for a long time. If it's consistently finishing well under 60 seconds regardless of the log size, then you may have problems with the method of executing check_log.

You might also consider leveraging Nagios Log Server if you have a lot of log monitoring needs. It's probably a bit heavy-handed for only a handful of log files, though.
Is there a free edition of Nagios Log Server?

Yes! Nagios started in the open source community and we hold strong to our roots. Nagios Log Server is free to use for up to 500MB of log data per day. This makes it easy to monitor small environments or to try it in your environment before you buy it!
Former Nagios employee
https://www.mcapra.com/
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: check_log pluggin getting Service Check Timed Out

Post by dwhitfield »

mcapra wrote: You might try running check_log by hand from the CLI of the machine carrying the logs to see if the plugin is running for a long time.
This is an excellent suggestion, but I'd like to add that you can add time in from the command and get an actual time. In the first example below you can see that the command after time does execute, so make sure you don't run this on something that you don't actually want to run. In the second, you can see that it's pretty accurate (you know, assuming you trust sleep)

Code: Select all

[root@dw-licensedxi etc]# time cat /usr/local/nagios/etc/nrpe/common.cfg

### GENERIC SERVICES ###
command[check_init_service]=sudo /usr/local/nagios/libexec/check_init_service $ARG1$
command[check_services]=/usr/local/nagios/libexec/check_services -p $ARG1$

### MISC SYSTEM METRICS ###
#command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_users]=/usr/local/nagios/libexec/check_users $ARG1$
command[check_load]=/usr/local/nagios/libexec/check_load $ARG1$
command[check_swap]=/usr/local/nagios/libexec/check_swap $ARG1$
command[check_cpu_stats]=/usr/local/nagios/libexec/check_cpu_stats.sh $ARG1$
command[check_mem]=/usr/local/nagios/libexec/custom_check_mem -n $ARG1$

### SYSTEM UPDATES ###
command[check_yum]=/usr/local/nagios/libexec/check_yum
command[check_apt]=/usr/local/nagios/libexec/check_apt

### DISK ###
command[check_disk]=/usr/local/nagios/libexec/check_disk $ARG1$
command[check_ide_smart]=/usr/local/nagios/libexec/check_ide_smart $ARG1$

### PROCESSES ###
command[check_all_procs]=/usr/local/nagios/libexec/custom_check_procs
command[check_procs]=/usr/local/nagios/libexec/check_procs $ARG1$

### OPEN FILES ###
command[check_open_files]=/usr/local/nagios/libexec/check_open_files.pl $ARG1$

### NETWORK CONNECTIONS ###
command[check_netstat]=/usr/local/nagios/libexec/check_netstat.pl -p $ARG1$ $ARG2$
real    0m0.001s
user    0m0.000s
sys     0m0.000s

Code: Select all

[root@dw-licensedxi etc]# time sleep 5

real    0m5.013s
user    0m0.000s
sys     0m0.000s
bosco
Posts: 40
Joined: Thu Nov 24, 2016 5:34 am

Re: check_log pluggin getting Service Check Timed Out

Post by bosco »

Thanks for the response yes we are using check_NRPE for the executing the check_log pluggin,also we have about 6 different log check doing using the same plugin .does that be an issue and making load on the plugin. will it make sense if we keep different plugin in script folder for eg: check_log,check_log1,check_log2,check_log3 and reudce the load from the single plugin.
kyang

Re: check_log pluggin getting Service Check Timed Out

Post by kyang »

I don't think that's the issue. It should run just fine for each check you defined.

The question @mcapra addressed, is how large are your log files that are being checked?

Could you do a ls -l on the 6 different files you are using the check_log plugin on? Show us the output, thanks!

Also, could you run the the check from the command line while adding time in front of your command like so --> time ./check_nrpe -H <remotehost> -c check_log

An example of what to show us.

Code: Select all

[root@localhost libexec]# time ./check_nrpe -H 192.168.4.174
NRPE v3.2.0

real    0m0.022s
user    0m0.009s
sys     0m0.002s
Thanks!
bosco
Posts: 40
Joined: Thu Nov 24, 2016 5:34 am

Re: check_log pluggin getting Service Check Timed Out

Post by bosco »

Thanks,,The plugin is working fine but some time it is throwing the alert with (Service Check Timed Out) and critical false alert is generated which need to be avoided,below given is the the command used for the executing the plugin,tiemout mentioned is t 60 can increasing that gives the plugin sufficient time to run properly.

# check_logcontent_POSLogImporter command definition (using nrpe)
define command{
command_name check_logcontent_POSLogImporter
command_line $USER1$/check_nrpe -u -t 60 -H $HOSTADDRESS$ -c check_logcontent_POSLogImporter
}

# check_logcontent_POSLogImporter command definition (using nrpe)
define command{
command_name check_logcontent_POSLogImporter
command_line $USER1$/check_nrpe -u -t 60 -H $HOSTADDRESS$ -c check_logcontent_POSLogImporter
}
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: check_log pluggin getting Service Check Timed Out

Post by mcapra »

You can increase the check_nrpe timeout indefinitely but it won't have any effect on the service_check_timeout setting in the main Nagios core configuration file. I believe this is the constraint you are encountering, not something within NRPE saying "this ran too long" but something within Nagios Core saying "this ran too long".

Regarding multiple copies of check_log under different command definitions, I do not believe this would make check_log more efficient.

I'd be very interested in knowing the size of the log files and doing some bench-marking of the plugin to see what it can realistically handle. If for nothing else to make sure we aren't running into some hard-cap of what the plugin can handle in a reasonable amount of time.

It would also be useful to know, on your remote machine (not the Nagios Core machine), what version of check_log is being used. check_log --version should be able to give us this information.
Former Nagios employee
https://www.mcapra.com/
kyang

Re: check_log pluggin getting Service Check Timed Out

Post by kyang »

Thanks @mcapra!

bosco, please give us the information that mcapra requested. It would certainly help!
bosco
Posts: 40
Joined: Thu Nov 24, 2016 5:34 am

Re: check_log pluggin getting Service Check Timed Out

Post by bosco »

log file folder size is around 10Gb of size from which using the below criterion in nsclient mentioned are used
********************************************************
allow arguments = true
allow_nasty_meta_chars = true
port = 5666
check_logcontent_XXXLogImporter=scripts\check_log3.exe -logfile=d:\XXXApps\logfiles\ -m "*XXXLogImporter.txt" -p "File has been moved to" -t most_recent -c 1
check_logcontent_XXXincoming_Channel_is_busy=scripts\check_log3.exe -logfile=d:\XXXApps\logfiles\ -m "Tillincoming*" -p "Channel is busy" -t most_recent -c 1
check_logcontent_XXXincoming_TrxValue=scripts\check_log3.exe -logfile=d:\XXXApps\logfiles\ -m "XXXincoming*" -p "TrxValue" -t most_recent -w 1
check_logcontent_XXXOUTPUT=scripts\check_log3.exe -logfile=d:\XXXApps\logfiles\ -m "XXOUTPUT*" -p "An error occurred while executing the command definition" -t most_recent -c 1
check_logcontent_XXXService_fatalerror=scripts\check_log3.exe -logfile=d:\XXXApps\logfiles\ -m "XXXService*" -p "fatal error" -t most_recent -c 1
check_logcontent_XXXService_sending_email=scripts\check_log3.exe -logfile=d:\XXXApps\logfiles\ -m "B2BService*" -p "sending e-mail" -t most_recent -c 1
check_logcontent_XXXService_Socketerror=scripts\check_log3.exe -logfile=d:\XXXApps\logfiles\ -m "XXXService*" -p "Socket Error" -t most_recent -c 1
check_logcontent_XXXService_ANKER999=scripts\check_log3.exe -logfile=d:\XXXApps\logfiles\ -m "XXXService*" -p "ANKER999 file not found" -t most_recent -c 1
check_logcontent_XXXService_POALERT=scripts\check_log3.exe -logfile=d:\XXXApps\logfiles\ -m "XXXService*" -p "POALERT - Unable to send Email" -t most_recent -c 1
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: check_log pluggin getting Service Check Timed Out

Post by mcapra »

Ah, it's very useful to know which specific plugin is being used. check_log3 and check_log operate in different ways.

That said, can you share which specific versions of check_log3 and NSClient++ are being used on the remote machine?

You might consider leveraging the seekfile argument to allow the plugin to "pick up where it left off" rather than, for each execution of check_log3, parse the entire 10GB worth of log files. Here's some usage instructions from the plugin's help section:

Code: Select all

[-s seekfile|basedir]

-s, --seekfile=
    The temporary file to store the seek position of the last scan.  If not
    specified, it will be automatically generated in /tmp, based on the
    log file's base name.  If this is a directory, the seek file will be auto-
    generated there instead of in /tmp.
    If you specify the system's null device (/dev/null), the entire log file
    will be read every time.
Based on that, my guess is that there should be a bunch of "seekfiles" in C:\Temp or D:\XXXApps\logfiles\. If there isn't, that's likely part of the performance problem. If the seekfiles aren't being written there, or they aren't being persisted for very long, you might give the NSClient++ service account a specific directory to write those "seekfiles" to.

You could also double check the Windows TMP and TEMP environment variables since File::Spec leverages them in check_log3 to figure out the proper paths for the temp directory.
Former Nagios employee
https://www.mcapra.com/
Locked