Some NRPE checks are in UNKNOWN state after fresh install

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
dlukinski
Posts: 1130
Joined: Tue Oct 06, 2015 9:42 am

Some NRPE checks are in UNKNOWN state after fresh install

Post by dlukinski »

Hello XI support

This topic is in response to all other threads (I asked to close). Please consider this new case instead, having best info collected.

Given:
- NRPE agent 2.15 from NAGIOS assets (for SuSe 11-12/Redhad 6.x)
- SuSe sles 11 SP3 servers (in our case required 2 consecutive glibc updates and reboots before we were able to proceed with the script)
- RedHat EL 6.4 servers
- ./fullscript configures xinetd for access (not nrpe.cfg), BUT other checks work (was thinking about it too)
- pre-reqs (most are there)


After NRPE agent install and XI configuration via Linux Server Wizard :

- CPU Stats Unknown 2h 51m 38s 5/5 2016-03-03 19:03:14 NRPE: Unable to read output
- Memory Usage Unknown 2h 50m 12s 5/5 2016-03-03 19:04:40 NRPE: Unable to read output
- Open Files Unknown 2h 50m 16s 5/5 2016-03-03 19:05:06 NRPE: Unable to read output

- rest of standard checks, produced by the Wizard are operational (except for YUM in RedHat case)

When trying to run same checks via CLI from XI, same "Unable to read output" results
When trying to run same checks locally on the servers (attached), all of them RUN.

------------------------------------------------------------------------------------------------------
LAB installs of SuSe sles 11 SP3 (fully patched), CentOS 6.x and OpenSuse 12 are fully operational in the same scenario.

Is there way to figure which dependencies/updates are missing and/or required for these 3 checks to run?
You do not have the required permissions to view the files attached to this post.
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Some NRPE checks are in UNKNOWN state after fresh instal

Post by gormank »

Can you go to CCM, open the service and click test?
Then go to a host and run the content of the command from there. Note that a command isn't a script or binary, but a definition of how to run a script or binary with arguments.

On an unrelated topic, be careful using the wizards--they are best for creating a base service to modify and make generic. I use templates to define monitoring, contacts, etc. and hostgroups with hosts attached via the hostgroup, not the hosts. Then attach hostgroups to services. This way things are modular and to change monitoring intervals, contacts, retries, and so on, I only need to make the change in one place. My hosts contain the host info, and a template. Services contain service info, hostgroups, and a template. Templates contain monitoring info and contacts.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Some NRPE checks are in UNKNOWN state after fresh instal

Post by rkennedy »

Thanks @gormank! Templates will help immensely as an environment grows.

On the RHEL and SUSE machine, can you run ls -l /usr/local/nagios/libexec/? I suspect the permissions aren't right, which is preventing NRPE from running your script.
Former Nagios Employee
dlukinski
Posts: 1130
Joined: Tue Oct 06, 2015 9:42 am

Re: Some NRPE checks are in UNKNOWN state after fresh instal

Post by dlukinski »

gormank wrote:Can you go to CCM, open the service and click test?
Then go to a host and run the content of the command from there. Note that a command isn't a script or binary, but a definition of how to run a script or binary with arguments.

On an unrelated topic, be careful using the wizards--they are best for creating a base service to modify and make generic. I use templates to define monitoring, contacts, etc. and hostgroups with hosts attached via the hostgroup, not the hosts. Then attach hostgroups to services. This way things are modular and to change monitoring intervals, contacts, retries, and so on, I only need to make the change in one place. My hosts contain the host info, and a template. Services contain service info, hostgroups, and a template. Templates contain monitoring info and contacts.
We use Wizard for the initial service creation (copied and modified for various needs). Assumption is that Wizard is "out of the box" solution that has to work by default.
Host command runs are attached already. XI CLI produced same result as the GUI-based notifications "OUTPUT: NRPE: Unable to read output"
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Some NRPE checks are in UNKNOWN state after fresh instal

Post by gormank »

Sorry, I didn't opent the attachments...
The commands are executed as nagios by nrpe on the hosts. Try that.

[root@fihp-alfdev04 ~]# /usr/local/nagios/libexec/check_open_files.pl -w 30 -c 50

Are the commands defined in the nrpe.cfg or commands.cfg?

There's a trubleshooting NRPE doc that pretty much resolves all issues like this. It has a section that addresses this topic specifically.
dlukinski
Posts: 1130
Joined: Tue Oct 06, 2015 9:42 am

Re: Some NRPE checks are in UNKNOWN state after fresh instal

Post by dlukinski »

gormank wrote:Sorry, I didn't opent the attachments...
The commands are executed as nagios by nrpe on the hosts. Try that.

[root@fihp-alfdev04 ~]# /usr/local/nagios/libexec/check_open_files.pl -w 30 -c 50

Are the commands defined in the nrpe.cfg or commands.cfg?

There's a trubleshooting NRPE doc that pretty much resolves all issues like this. It has a section that addresses this topic specifically.
Commands on the host work as anything. It is the remote part that does not work.
troubleshooting NRPE manual does not apply in this case:
- same checks work on LAB servers.
- this is out of the box NRPE agent (from Nagios Assets)

attached zipped configuration files from the host
You do not have the required permissions to view the files attached to this post.
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Some NRPE checks are in UNKNOWN state after fresh instal

Post by gormank »

The host I'm referring to is the remote host.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Some NRPE checks are in UNKNOWN state after fresh instal

Post by lmiltchev »

Commands on the host work as anything. It is the remote part that does not work.
Can you show us the actual commands run from the command line on the Nagios XI server along with the output?

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H <remote ip> -c <command> -a '<arguments>'
Examples:

Code: Select all

[root@localhost nagiosxi]# /usr/local/nagios/libexec/check_nrpe -H x.x.x.x -c check_cpu_stats -a '-w 85 -c 95'
CPU STATISTICS OK : user=1.01% system=0.00% iowait=0.00% idle=98.99% nice=0.00% steal=0.00% | CpuUser=1.01;CpuSystem=0.00;CpuIoWait=0.00;CpuIdle=98.99;CpuNice=0.00;CpuSteal=0.00;85;95

[root@localhost nagiosxi]# /usr/local/nagios/libexec/check_nrpe -H x.x.x.x -c check_mem -a '-w 20 -c 10'
OK - 1060 / 3962 MB (26%) Free Memory, Used: 3704 MB, Shared: 0 MB, Buffers: 191 MB, Cached: 802 MB | total=3962MB free=1060MB used=3704MB shared=0 buffers=191MB cached=802MB

[root@localhost nagiosxi]# /usr/local/nagios/libexec/check_nrpe -H x.x.x.x -c check_open_files -a '-w 30 -c 50'  OK: 1984 open files (0% of max 398525)|opened_files=1984;119557;199262
fihp-tcmigra1:/usr/local/nagios/etc # /usr/local/nagios/libexec/check_cpu_stats.sh -w 85 -c 95
/usr/local/nagios/libexec/check_cpu_stats.sh: line 139: [: 0.00: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 142: [: 0.00: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 152: [: 0.30: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 156: [: 0.20: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 160: [: 0.00: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 164: [: 99.50: integer expression expected

CPU STATISTICS OK: user=0.30% system=0.20% iowait=0.00% idle=99.50% | user=0.30% system=0.20% iowait=0.00%;85;95 idle=99.50%
If order to eliminate this error, you will need to update your "check_cpu_stats.sh" plugin. Download the newer version of the plugin from here:

https://exchange.nagios.org/directory/P ... ed/details

Make a backup of the original plugin before replacing it (just in case). Hope this helps.
Be sure to check out our Knowledgebase for helpful articles and solutions!
dlukinski
Posts: 1130
Joined: Tue Oct 06, 2015 9:42 am

Re: Some NRPE checks are in UNKNOWN state after fresh instal

Post by dlukinski »

lmiltchev wrote:
Commands on the host work as anything. It is the remote part that does not work.
Can you show us the actual commands run from the command line on the Nagios XI server along with the output?

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H <remote ip> -c <command> -a '<arguments>'
Examples:

Code: Select all

[root@localhost nagiosxi]# /usr/local/nagios/libexec/check_nrpe -H x.x.x.x -c check_cpu_stats -a '-w 85 -c 95'
CPU STATISTICS OK : user=1.01% system=0.00% iowait=0.00% idle=98.99% nice=0.00% steal=0.00% | CpuUser=1.01;CpuSystem=0.00;CpuIoWait=0.00;CpuIdle=98.99;CpuNice=0.00;CpuSteal=0.00;85;95

[root@localhost nagiosxi]# /usr/local/nagios/libexec/check_nrpe -H x.x.x.x -c check_mem -a '-w 20 -c 10'
OK - 1060 / 3962 MB (26%) Free Memory, Used: 3704 MB, Shared: 0 MB, Buffers: 191 MB, Cached: 802 MB | total=3962MB free=1060MB used=3704MB shared=0 buffers=191MB cached=802MB

[root@localhost nagiosxi]# /usr/local/nagios/libexec/check_nrpe -H x.x.x.x -c check_open_files -a '-w 30 -c 50'  OK: 1984 open files (0% of max 398525)|opened_files=1984;119557;199262
fihp-tcmigra1:/usr/local/nagios/etc # /usr/local/nagios/libexec/check_cpu_stats.sh -w 85 -c 95
/usr/local/nagios/libexec/check_cpu_stats.sh: line 139: [: 0.00: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 142: [: 0.00: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 152: [: 0.30: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 156: [: 0.20: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 160: [: 0.00: integer expression expected
/usr/local/nagios/libexec/check_cpu_stats.sh: line 164: [: 99.50: integer expression expected

CPU STATISTICS OK: user=0.30% system=0.20% iowait=0.00% idle=99.50% | user=0.30% system=0.20% iowait=0.00%;85;95 idle=99.50%
If order to eliminate this error, you will need to update your "check_cpu_stats.sh" plugin. Download the newer version of the plugin from here:

https://exchange.nagios.org/directory/P ... ed/details

Make a backup of the original plugin before replacing it (just in case). Hope this helps.
Updating .sh script have removed "integer expression" lines.
However NAGIOS is still unable to access these checks remotely with "NRPE: Unable to read output" from either GUI or CLI attempts
CPU Stats / Memory usage and Open Files only work locally on the host. Other checks, generated by Linux server wizard (say 'Total Processes') do work remotely:
- 3 cases ( 2 x REDHAT EL 6.4 and 1 x SuSe 11 SP3)

[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.102 -c check_cpu_stats -a '-w 85 -c 95'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.95 -c check_cpu_stats -a '-w 85 -c 95'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.121 -c check_cpu_stats -a '-w 85 -c 95'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.102 -c check_mem -a '-w 20 -c 10'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.95 -c check_mem -a '-w 20 -c 10'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.121 -c check_mem -a '-w 20 -c 10'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.102 -c check_open_files -a '-w 30 -c 50'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.95 -c check_open_files -a '-w 30 -c 50'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.121 -c check_open_files -a '-w 30 -c 50'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]#
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Some NRPE checks are in UNKNOWN state after fresh instal

Post by rkennedy »

[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.102 -c check_cpu_stats -a '-w 85 -c 95'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.95 -c check_cpu_stats -a '-w 85 -c 95'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.121 -c check_cpu_stats -a '-w 85 -c 95'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.102 -c check_mem -a '-w 20 -c 10'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.95 -c check_mem -a '-w 20 -c 10'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.121 -c check_mem -a '-w 20 -c 10'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.102 -c check_open_files -a '-w 30 -c 50'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.95 -c check_open_files -a '-w 30 -c 50'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.x.x.121 -c check_open_files -a '-w 30 -c 50'
NRPE: Unable to read output
[root@fikc-nagxidev01 ~]#
You're using a different host in every single one, can we please stick to troubleshooting one machine at a time to avoid any confusion?
On the RHEL and SUSE machine, can you run ls -l /usr/local/nagios/libexec/? I suspect the permissions aren't right, which is preventing NRPE from running your script.
I didn't see the permissions verified, can you please run the command from above?

What is the result now of you running /usr/local/nagios/libexec/check_cpu_stats.sh -w 85 -c 95 with the fixed bash script?

Also - I don't see an entry for check_cpu_stats in your NRPE configuration. You'll need one similar to this -

Code: Select all

command[check_cpu_stats]=/usr/local/nagios/libexec/check_cpu_stats.sh -w 85 -c 95
Then restart xinetd to update your configuration - service xinetd restart

Now, from the XI machine try running a check against the machine you just modified (tcmigra1). /usr/local/nagios/libexec/check_nrpe -H tcmigra1 -c check_cpu_stats. What is the result?
Former Nagios Employee
Locked