Unfortunately, I still can't quite get this working.
I forgot to put this in the original post, but here's the relevant version information:
Nagios Server:
Red Hat Enterprise Linux Server release 7.8 (Maipo)
Nagios XI 5.6.14
TARGET_HOSTs:
NRPE: Version: 2.15 OS: RHEL 6.10 and RHEL 7.7
NRPE: Version: 4.0.3 OS: CentOS8
Our Nagios configuration has this in it from some time long before I joined the team. If I add a new Linux host using the configuration wizard, it uses this check_nrpe command (I added the -2 a few months back because it was filling our logs with version errors, and I found some thread on here that said to do that).
Code: Select all
define command {
command_name check_nrpe
command_line $USER1$/check_nrpe -2 -H $HOSTADDRESS$ -t 30 -c $ARG1$
}
$ARG1$ = check_$CMD '-$OPTS'
In this specific instance, ARG1 = check_procs -a '-c 1:1024 -Csshd', but the vast majority of our services are configured in this way, like checking time drift is ARG1 = check_time -a '-w 15 -c 30'. Particularly troubling is that
Like I said, that worked for years, and I have no idea why or when it stopped. This is particularly worrisome because it was by chance that I caught this. That all services report OK in Nagios despite that almost 0 of the commands work is a massive failure from Nagios. My boards should be glowing red, and my email should be flooded with service failures, but it took me adding and testing a new check to find this. I'm not sure if you work for them, or if I should submit this feedback another way, but this needs to get fixed even if the syntax I'm using is no longer valid.
Some relevant excerpts from TARGET_HOST's nrpe config file
dont_blame_nrpe=1
include_dir=/etc/nrpe.d/
Contents of the config file in /etc/nrpe.d
Code: Select all
command[check_swap]=/usr/lib64/nagios/plugins/check_swap $ARG1$
command[check_disk]=/usr/lib64/nagios/plugins/check_disk $ARG1$
command[check_load]=/usr/lib64/nagios/plugins/check_load $ARG1$
command[check_procs]=/usr/lib64/nagios/plugins/check_procs $ARG1$
command[check_time]=/usr/lib64/nagios/plugins/check_ntp_time -H net-ntp00.uchicago.edu $ARG1$
command[check_zombie_procs]=/usr/lib64/nagios/plugins/check_procs -s Z $ARG1$
command[check_mem]=/etc/nagios/scripts/check_mem.sh $ARG1$
command[check_mount]=/etc/nagios/scripts/check_mount.sh
command[check_disk_stat]=/etc/nagios/scripts/check_diskstat.sh $ARG1$
command[check_network_stats]=/etc/nagios/scripts/stat_net.pl
command[check_cpu_stats]=/etc/nagios/scripts/check_cpu_stats.sh $ARG1$
command[check_openmanage]=/etc/nagios/scripts/check_openmanage $ARG1$
Trying your suggestion from the CLI:
Code: Select all
[root@nagios]#/usr/local/nagios/libexec/check_nrpe -H $TARGET_HOST -c check_procs '-a -C sshd -c 1:'
PROCS WARNING: 1 process with args '-C' | procs=1;sshd;1:;0;
This is incorrect, though. Actual sshd processes running on TARGET_HOST
Code: Select all
[root@TARGET_HOST]# pgrep sshd | wc -l
3
Figuring it probably sees options as arguments, I moved the first apostrophe over, and it works.
Code: Select all
[root@nagios]#/usr/local/nagios/libexec/check_nrpe -H $TARGET_HOST -c check_procs -a '-C sshd -c 1:'
PROCS OK: 3 processes with command name 'sshd' | procs=3;;1:;0;
Replicating this in the WebUI:
This fails no matter how I try to structure the syntax using this command
Code: Select all
define command {
command_name check_nrpe
command_line $USER1$/check_nrpe -2 -H $HOSTADDRESS$ -t 30 -c $ARG1$
}
So I created a new command called check_procs. I did this through the WebUI, after which I applied the conifguration.
Code: Select all
define command {
command_name check_procs
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_procs $ARG1$
}
ARG1 =
'-a -C sshd -c 1:'
Code: Select all
SQL Error [nagiosxi] : ERROR: syntax error at or near "sshd"
LINE 1: ...TARGET_HOST -c check_procs \'-a -C sshd -c 1:...
^
[nagios@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H aaaTARGET_HOST -c check_procs '-a -C sshd -c 1:'
Error submitting command
.
ARG1 =
-a '-C sshd -c 1:'
Code: Select all
SQL Error [nagiosxi] : ERROR: syntax error at or near "sshd"
LINE 1: ...TARGET_HOST -c check_procs -a \'-C sshd -c 1:...
^
[nagios@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H aaaTARGET_HOST -c check_procs -a '-C sshd -c 1:'
Error submitting command.
So I modified this in TARGET_HOST's nrpe.cfg
Code: Select all
command[check_procs]=/usr/lib64/nagios/plugins/check_procs -a $ARG1$
I then updated the command in Nagios and applied the config again.
Code: Select all
define command {
command_name check_procs
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_procs -a $ARG1$
}
ARG1 =
-C sshd -c 1:
Code: Select all
[nagios@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H aaaTARGET_HOST -c check_procs -a -C sshd -c 1:
ARG1 =
'-C sshd -c 1:'
Code: Select all
SQL Error [nagiosxi] : ERROR: syntax error at or near "sshd"
LINE 1: ...TARGET_HOST -c check_procs -a \'-C sshd -c 1:...
^
[nagios@nagios ~]$ /usr/local/nagios/libexec/check_nrpe -H aaaTARGET_HOST -c check_procs -a '-C sshd -c 1:'
Error submitting command.
That's broken from the CLI, too, so I guess it's just not good syntax.
Code: Select all
[root@nagios]#/usr/local/nagios/libexec/check_nrpe -H $TARGET_HOST -c check_procs -a '-C sshd -c 1:'
PROCS CRITICAL: 0 processes with args '-Csshd' | procs=0;;1:;0;