Some NRPE checks are in UNKNOWN state after fresh install
Some NRPE checks are in UNKNOWN state after fresh install
Hello XI support
This topic is in response to all other threads (I asked to close). Please consider this new case instead, having best info collected.
Given:
- NRPE agent 2.15 from NAGIOS assets (for SuSe 11-12/Redhad 6.x)
- SuSe sles 11 SP3 servers (in our case required 2 consecutive glibc updates and reboots before we were able to proceed with the script)
- RedHat EL 6.4 servers
- ./fullscript configures xinetd for access (not nrpe.cfg), BUT other checks work (was thinking about it too)
- pre-reqs (most are there)
After NRPE agent install and XI configuration via Linux Server Wizard :
- CPU Stats Unknown 2h 51m 38s 5/5 2016-03-03 19:03:14 NRPE: Unable to read output
- Memory Usage Unknown 2h 50m 12s 5/5 2016-03-03 19:04:40 NRPE: Unable to read output
- Open Files Unknown 2h 50m 16s 5/5 2016-03-03 19:05:06 NRPE: Unable to read output
- rest of standard checks, produced by the Wizard are operational (except for YUM in RedHat case)
When trying to run same checks via CLI from XI, same "Unable to read output" results
When trying to run same checks locally on the servers (attached), all of them RUN.
------------------------------------------------------------------------------------------------------
LAB installs of SuSe sles 11 SP3 (fully patched), CentOS 6.x and OpenSuse 12 are fully operational in the same scenario.
Is there way to figure which dependencies/updates are missing and/or required for these 3 checks to run?
This topic is in response to all other threads (I asked to close). Please consider this new case instead, having best info collected.
Given:
- NRPE agent 2.15 from NAGIOS assets (for SuSe 11-12/Redhad 6.x)
- SuSe sles 11 SP3 servers (in our case required 2 consecutive glibc updates and reboots before we were able to proceed with the script)
- RedHat EL 6.4 servers
- ./fullscript configures xinetd for access (not nrpe.cfg), BUT other checks work (was thinking about it too)
- pre-reqs (most are there)
After NRPE agent install and XI configuration via Linux Server Wizard :
- CPU Stats Unknown 2h 51m 38s 5/5 2016-03-03 19:03:14 NRPE: Unable to read output
- Memory Usage Unknown 2h 50m 12s 5/5 2016-03-03 19:04:40 NRPE: Unable to read output
- Open Files Unknown 2h 50m 16s 5/5 2016-03-03 19:05:06 NRPE: Unable to read output
- rest of standard checks, produced by the Wizard are operational (except for YUM in RedHat case)
When trying to run same checks via CLI from XI, same "Unable to read output" results
When trying to run same checks locally on the servers (attached), all of them RUN.
------------------------------------------------------------------------------------------------------
LAB installs of SuSe sles 11 SP3 (fully patched), CentOS 6.x and OpenSuse 12 are fully operational in the same scenario.
Is there way to figure which dependencies/updates are missing and/or required for these 3 checks to run?
You do not have the required permissions to view the files attached to this post.
Re: Some NRPE checks are in UNKNOWN state after fresh instal
Can you go to CCM, open the service and click test?
Then go to a host and run the content of the command from there. Note that a command isn't a script or binary, but a definition of how to run a script or binary with arguments.
On an unrelated topic, be careful using the wizards--they are best for creating a base service to modify and make generic. I use templates to define monitoring, contacts, etc. and hostgroups with hosts attached via the hostgroup, not the hosts. Then attach hostgroups to services. This way things are modular and to change monitoring intervals, contacts, retries, and so on, I only need to make the change in one place. My hosts contain the host info, and a template. Services contain service info, hostgroups, and a template. Templates contain monitoring info and contacts.
Then go to a host and run the content of the command from there. Note that a command isn't a script or binary, but a definition of how to run a script or binary with arguments.
On an unrelated topic, be careful using the wizards--they are best for creating a base service to modify and make generic. I use templates to define monitoring, contacts, etc. and hostgroups with hosts attached via the hostgroup, not the hosts. Then attach hostgroups to services. This way things are modular and to change monitoring intervals, contacts, retries, and so on, I only need to make the change in one place. My hosts contain the host info, and a template. Services contain service info, hostgroups, and a template. Templates contain monitoring info and contacts.
Re: Some NRPE checks are in UNKNOWN state after fresh instal
Thanks @gormank! Templates will help immensely as an environment grows.
On the RHEL and SUSE machine, can you run ls -l /usr/local/nagios/libexec/? I suspect the permissions aren't right, which is preventing NRPE from running your script.
On the RHEL and SUSE machine, can you run ls -l /usr/local/nagios/libexec/? I suspect the permissions aren't right, which is preventing NRPE from running your script.
Former Nagios Employee
Re: Some NRPE checks are in UNKNOWN state after fresh instal
We use Wizard for the initial service creation (copied and modified for various needs). Assumption is that Wizard is "out of the box" solution that has to work by default.gormank wrote:Can you go to CCM, open the service and click test?
Then go to a host and run the content of the command from there. Note that a command isn't a script or binary, but a definition of how to run a script or binary with arguments.
On an unrelated topic, be careful using the wizards--they are best for creating a base service to modify and make generic. I use templates to define monitoring, contacts, etc. and hostgroups with hosts attached via the hostgroup, not the hosts. Then attach hostgroups to services. This way things are modular and to change monitoring intervals, contacts, retries, and so on, I only need to make the change in one place. My hosts contain the host info, and a template. Services contain service info, hostgroups, and a template. Templates contain monitoring info and contacts.
Host command runs are attached already. XI CLI produced same result as the GUI-based notifications "OUTPUT: NRPE: Unable to read output"
Re: Some NRPE checks are in UNKNOWN state after fresh instal
Sorry, I didn't opent the attachments...
The commands are executed as nagios by nrpe on the hosts. Try that.
[root@fihp-alfdev04 ~]# /usr/local/nagios/libexec/check_open_files.pl -w 30 -c 50
Are the commands defined in the nrpe.cfg or commands.cfg?
There's a trubleshooting NRPE doc that pretty much resolves all issues like this. It has a section that addresses this topic specifically.
The commands are executed as nagios by nrpe on the hosts. Try that.
[root@fihp-alfdev04 ~]# /usr/local/nagios/libexec/check_open_files.pl -w 30 -c 50
Are the commands defined in the nrpe.cfg or commands.cfg?
There's a trubleshooting NRPE doc that pretty much resolves all issues like this. It has a section that addresses this topic specifically.
Re: Some NRPE checks are in UNKNOWN state after fresh instal
Commands on the host work as anything. It is the remote part that does not work.gormank wrote:Sorry, I didn't opent the attachments...
The commands are executed as nagios by nrpe on the hosts. Try that.
[root@fihp-alfdev04 ~]# /usr/local/nagios/libexec/check_open_files.pl -w 30 -c 50
Are the commands defined in the nrpe.cfg or commands.cfg?
There's a trubleshooting NRPE doc that pretty much resolves all issues like this. It has a section that addresses this topic specifically.
troubleshooting NRPE manual does not apply in this case:
- same checks work on LAB servers.
- this is out of the box NRPE agent (from Nagios Assets)
attached zipped configuration files from the host
You do not have the required permissions to view the files attached to this post.
Re: Some NRPE checks are in UNKNOWN state after fresh instal
The host I'm referring to is the remote host.
Re: Some NRPE checks are in UNKNOWN state after fresh instal
Can you show us the actual commands run from the command line on the Nagios XI server along with the output?Commands on the host work as anything. It is the remote part that does not work.
Code: Select all
/usr/local/nagios/libexec/check_nrpe -H <remote ip> -c <command> -a '<arguments>'Code: Select all
[root@localhost nagiosxi]# /usr/local/nagios/libexec/check_nrpe -H x.x.x.x -c check_cpu_stats -a '-w 85 -c 95'
CPU STATISTICS OK : user=1.01% system=0.00% iowait=0.00% idle=98.99% nice=0.00% steal=0.00% | CpuUser=1.01;CpuSystem=0.00;CpuIoWait=0.00;CpuIdle=98.99;CpuNice=0.00;CpuSteal=0.00;85;95
[root@localhost nagiosxi]# /usr/local/nagios/libexec/check_nrpe -H x.x.x.x -c check_mem -a '-w 20 -c 10'
OK - 1060 / 3962 MB (26%) Free Memory, Used: 3704 MB, Shared: 0 MB, Buffers: 191 MB, Cached: 802 MB | total=3962MB free=1060MB used=3704MB shared=0 buffers=191MB cached=802MB
[root@localhost nagiosxi]# /usr/local/nagios/libexec/check_nrpe -H x.x.x.x -c check_open_files -a '-w 30 -c 50' OK: 1984 open files (0% of max 398525)|opened_files=1984;119557;199262If order to eliminate this error, you will need to update your "check_cpu_stats.sh" plugin. Download the newer version of the plugin from here:fihp-tcmigra1:/usr/local/nagios/etc # /usr/local/nagios/libexec/check_cpu_stats.sh -w 85 -c 95
/usr/local/nagios/libexec/check_cpu_stats.sh: line 139: [: 0.00: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 142: [: 0.00: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 152: [: 0.30: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 156: [: 0.20: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 160: [: 0.00: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 164: [: 99.50: integer expression expected
CPU STATISTICS OK: user=0.30% system=0.20% iowait=0.00% idle=99.50% | user=0.30% system=0.20% iowait=0.00%;85;95 idle=99.50%
https://exchange.nagios.org/directory/P ... ed/details
Make a backup of the original plugin before replacing it (just in case). Hope this helps.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Some NRPE checks are in UNKNOWN state after fresh instal
Updating .sh script have removed "integer expression" lines.lmiltchev wrote:Can you show us the actual commands run from the command line on the Nagios XI server along with the output?Commands on the host work as anything. It is the remote part that does not work.
Examples:Code: Select all
/usr/local/nagios/libexec/check_nrpe -H <remote ip> -c <command> -a '<arguments>'
Code: Select all
[root@localhost nagiosxi]# /usr/local/nagios/libexec/check_nrpe -H x.x.x.x -c check_cpu_stats -a '-w 85 -c 95' CPU STATISTICS OK : user=1.01% system=0.00% iowait=0.00% idle=98.99% nice=0.00% steal=0.00% | CpuUser=1.01;CpuSystem=0.00;CpuIoWait=0.00;CpuIdle=98.99;CpuNice=0.00;CpuSteal=0.00;85;95 [root@localhost nagiosxi]# /usr/local/nagios/libexec/check_nrpe -H x.x.x.x -c check_mem -a '-w 20 -c 10' OK - 1060 / 3962 MB (26%) Free Memory, Used: 3704 MB, Shared: 0 MB, Buffers: 191 MB, Cached: 802 MB | total=3962MB free=1060MB used=3704MB shared=0 buffers=191MB cached=802MB [root@localhost nagiosxi]# /usr/local/nagios/libexec/check_nrpe -H x.x.x.x -c check_open_files -a '-w 30 -c 50' OK: 1984 open files (0% of max 398525)|opened_files=1984;119557;199262If order to eliminate this error, you will need to update your "check_cpu_stats.sh" plugin. Download the newer version of the plugin from here:fihp-tcmigra1:/usr/local/nagios/etc # /usr/local/nagios/libexec/check_cpu_stats.sh -w 85 -c 95
/usr/local/nagios/libexec/check_cpu_stats.sh: line 139: [: 0.00: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 142: [: 0.00: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 152: [: 0.30: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 156: [: 0.20: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 160: [: 0.00: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 164: [: 99.50: integer expression expected
CPU STATISTICS OK: user=0.30% system=0.20% iowait=0.00% idle=99.50% | user=0.30% system=0.20% iowait=0.00%;85;95 idle=99.50%
https://exchange.nagios.org/directory/P ... ed/details
Make a backup of the original plugin before replacing it (just in case). Hope this helps.
However NAGIOS is still unable to access these checks remotely with "NRPE: Unable to read output" from either GUI or CLI attempts
CPU Stats / Memory usage and Open Files only work locally on the host. Other checks, generated by Linux server wizard (say 'Total Processes') do work remotely:
- 3 cases ( 2 x REDHAT EL 6.4 and 1 x SuSe 11 SP3)
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.102 -c check_cpu_stats -a '-w 85 -c 95'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.95 -c check_cpu_stats -a '-w 85 -c 95'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.121 -c check_cpu_stats -a '-w 85 -c 95'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.102 -c check_mem -a '-w 20 -c 10'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.95 -c check_mem -a '-w 20 -c 10'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.121 -c check_mem -a '-w 20 -c 10'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.102 -c check_open_files -a '-w 30 -c 50'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.95 -c check_open_files -a '-w 30 -c 50'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.121 -c check_open_files -a '-w 30 -c 50'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]#
Re: Some NRPE checks are in UNKNOWN state after fresh instal
You're using a different host in every single one, can we please stick to troubleshooting one machine at a time to avoid any confusion?[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.102 -c check_cpu_stats -a '-w 85 -c 95'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.95 -c check_cpu_stats -a '-w 85 -c 95'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.121 -c check_cpu_stats -a '-w 85 -c 95'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.102 -c check_mem -a '-w 20 -c 10'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.95 -c check_mem -a '-w 20 -c 10'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.121 -c check_mem -a '-w 20 -c 10'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.102 -c check_open_files -a '-w 30 -c 50'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.95 -c check_open_files -a '-w 30 -c 50'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.121 -c check_open_files -a '-w 30 -c 50'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]#
I didn't see the permissions verified, can you please run the command from above?On the RHEL and SUSE machine, can you run ls -l /usr/local/nagios/libexec/? I suspect the permissions aren't right, which is preventing NRPE from running your script.
What is the result now of you running /usr/local/nagios/libexec/check_cpu_stats.sh -w 85 -c 95 with the fixed bash script?
Also - I don't see an entry for check_cpu_stats in your NRPE configuration. You'll need one similar to this -
Code: Select all
command[check_cpu_stats]=/usr/local/nagios/libexec/check_cpu_stats.sh -w 85 -c 95
Now, from the XI machine try running a check against the machine you just modified (tcmigra1). /usr/local/nagios/libexec/check_nrpe -H tcmigra1 -c check_cpu_stats. What is the result?
Former Nagios Employee