Nagios Support Forum • Monitoring Docker containers

Page 1 of 3

Monitoring Docker containers

Posted: Wed May 09, 2018 8:21 am

by jankogaga

Hi,

I am running Nagios Core 4.2.0 on KVM with Centos7 OS.
By following the next guide:
https://assets.nagios.com/downloads/nag ... ios-XI.pdf

check_jmx plugin has been installed on the remote server running JMX.
nrpe.cfg on the JMX server has been configured to support JMX checking, by adding
command[check_jmx]=/usr/lib64/nagios/check_jmx $ARG1$

On Nagios Core server, I have created .cfg file for the remote server running JMX.
Please find the attached jmx.cfg file.

After an implementation of all of above and restarting appropriate services at
Nagios Core GUI, I can see that
Heap memory usage is monitoring but it shows the error:
(Return code of 255 is out of bounds).

Thanks,
Dragan

Re: Monitoring JMX

Posted: Wed May 09, 2018 8:30 am

by mcapra

Your service definition's check_command directive has a syntax error in it:

Code: Select all

define service {
        use                             generic-service
        host_name                       test-difin.abz-testing.de
        service_description             Heap memory usage
        check_command                   check_nrpe!check_jmx!-a '-U service:jmx:rmi:///jndi/rmi://127.0.0.1:1099/jmxrmi -O java.lang:type=Memory -A HeapMemoryUsage -K used I HeapMemoryUsage -J used -vvvv -w 4248302272 -c 5498760192'
        notifications_enabled           1
}

There should be a dash in -I HeapMemoryUsage. Try changing that and see if it makes a difference.

If that doesn't help, can you share the output of the following commands executed from the CLI of the remote machine (10.30.30.33):

Code: Select all

/usr/lib64/nagios/check_jmx -U service:jmx:rmi:///jndi/rmi://127.0.0.1:1099/jmxrmi -O java.lang:type=Memory -A HeapMemoryUsage -K used -I HeapMemoryUsage -J used -vvvv -w 4248302272 -c 5498760192
ls -al /usr/lib64/nagios/
echo $JAVA_HOME

Also, if we could see your command definition for the check_nrpe command, that may be useful.

Re: Monitoring JMX

Posted: Wed May 09, 2018 9:54 am

by jankogaga

Inserting of the dash doesn't make difference.
The requested output is:

Code: Select all

[root@test-difin /]# /usr/lib64/nagios/check_jmx -U service:jmx:rmi:///jndi/rmi://127.0.0.1:1099/jmxrmi -O java.lang:type=Memory -A HeapMemoryUsage -K used -I HeapMemoryUsage -J used -vvvv -w 4248302272 -c 5498760192
JMX OK - HeapMemoryUsage.used=226035976 | HeapMemoryUsage.used=226035976,committed=2263351296;init=1054867456;max=14974713856;used=226035976
[root@test-difin /]# ls -al /usr/lib64/nagios/
total 24
drwxr-xr-x  2 root   root   4096 Feb 20 16:15 .
dr-xr-xr-x 66 root   root   4096 Feb 21 09:20 ..
-rwxr-xr-x  1 nagios nagios  140 Jan 16 15:10 check_jmx
-rwxr-xr-x  1 nagios nagios 9625 Jan 16 15:10 jmxquery.jar
[root@test-difin /]# echo $JAVA_HOME

[root@test-difin /]#

I can see that /usr/local/nagios/libexec/check_nrpe on Nagios Core server is binary file.
Here is attached file, please just remove .txt extension.

Thanks,
Dragan

Re: Monitoring JMX

Posted: Thu May 10, 2018 4:11 pm

by tgriep

Here are a couple of things to check.

In the nrpe.cfg config file on the remote system this option has to be set to a 1 to allow NRPE to accept arguments. Make sure it is set.

Code: Select all

dont_blame_nrpe=1

Change it if needed and restart NRPE to see if that fixes the issue.

Also, make sure the check_nrpe command is set to the following

Code: Select all

$USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ $ARG2$ $ARG3$

To test to see if the NRPE agent can be access by the nagios server, run the following on the Nagios server and it should display the NRPE agent's version.

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H 10.30.30.33

Let us know what you find.

Re: Monitoring JMX

Posted: Fri May 11, 2018 4:33 am

by jankogaga

I have changed dont_blame_nrpe to 1, restarted nrpe, but nothing changes.

I have found in /usr/local/nagios/etc/objects/commands.cfg

Code: Select all

command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

but I wouldn't change it since other Nagios checks regularly work with that.

The output of the command:

Code: Select all

[root@monitor ~]# /usr/local/nagios/libexec/check_nrpe -H 10.30.30.33
connect to address 10.30.30.33 port 5666: Connection refused

5666 port is enabled on the host where JMX server is located (and no additional firewall is "on" on it).
I have found within nrpe.cfg

Code: Select all

server_port=5666
allowed_hosts=127.0.0.1,192.168.0.0/16,172.17.0.0/16,10.0.0.0/8

Since IP address of Nagios server (VPN one) is 10.9.0.66, as per my understanding,
it is allowed for the Nagios server to access the JMX server.

Thanks,
Dragan

Re: Monitoring JMX

Posted: Fri May 11, 2018 8:14 am

by mcapra

jankogaga wrote: I have found in /usr/local/nagios/etc/objects/commands.cfg
Code: Select all
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

Your check_nrpe command definition only accepts 1 argument $ARG1$ and in your service's check_command directive you are attempting to pass it 2 arguments:

Code: Select all

check_command                   check_nrpe!check_jmx!-a '-U service:jmx:rmi:///jndi/rmi://127.0.0.1:1099/jmxrmi -O java.lang:type=Memory -A HeapMemoryUsage -K used -I HeapMemoryUsage -J used -vvvv -w 4248302272 -c 5498760192'

Essentially, this means everything following check_jmx in your check_command directive isn't making it to check_nrpe. You'll need to create a separate command definition that accepts multiple arguments. Or, for a slightly less clean solution, change your service's check_command directive to only pass in a single argument like so:

Code: Select all

check_command                   check_nrpe!check_jmx -a '-U service:jmx:rmi:///jndi/rmi://127.0.0.1:1099/jmxrmi -O java.lang:type=Memory -A HeapMemoryUsage -K used -I HeapMemoryUsage -J used -vvvv -w 4248302272 -c 5498760192'

jankogaga wrote: 5666 port is enabled on the host where JMX server is located (and no additional firewall is "on" on it).

Can you share the output of the following command executed from the CLI of your Nagios Core machine:

Code: Select all

nmap -sS -O -p5666 10.30.30.33

You may need to install the nmap or net-tools package on your Nagios Core machine if the command cannot be found.

Re: Monitoring JMX

Posted: Fri May 11, 2018 11:52 am

by tgriep

Thanks mcapra for the help.

If you are going to use arguments in your service check, then you will have to change the check_nrep command like the follow.

Code: Select all

$USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ $ARG2$ $ARG3$

If not, the Heap memory usage check will not work.
Doing the change should not affect your other service checks as they are only using $ARG1$.

Can you run this command on the remote system as root and post it here?

Code: Select all

netstat -anp |grep 5666

If it shows that that it is ran by xinetd, then you will need to add the IP address of the Nagios server to this file.

Code: Select all

/etc/xinetd.d/nrpe

When the agent is started by xinetd, it does not use the allowed_hosts from the nrpe.cfg file.
This is the option you have to edit to add the addresses and they need to have a space between then, no comma.

Code: Select all

only_from       = 127.0.0.1 192.168.112.130

Re: Monitoring JMX

Posted: Mon May 14, 2018 3:07 am

by jankogaga

Thank you both for great support.
I have changed check_nrpe command by adding

Code: Select all

command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ $ARG2$ $ARG3$

and have restarted Nagios service.

The heap memory usage error is still showing.

Here are the outputs:
On Nagios Core

Code: Select all

[root@monitor ~]# nmap -sS -O -p5666 10.30.30.33

Starting Nmap 6.40 ( http://nmap.org ) at 2018-05-14 09:34 CEST
Nmap scan report for 10.30.30.33
Host is up (0.00035s latency).
PORT     STATE  SERVICE
5666/tcp closed nrpe
Too many fingerprints match this host to give specific OS details

OS detection performed. Please report any incorrect results at http://nmap.org/submit/ 
Nmap done: 1 IP address (1 host up) scanned in 3.49 seconds

On Remote JMX server:

Code: Select all

[root@test-difin /]# netstat -anp |grep 5666

doesn't return anything.

Re: Monitoring JMX

Posted: Mon May 14, 2018 7:55 am

by mcapra

Simply put, based on this nmap output:

Code: Select all

[root@monitor ~]# nmap -sS -O -p5666 10.30.30.33

Starting Nmap 6.40 ( http://nmap.org ) at 2018-05-14 09:34 CEST
Nmap scan report for 10.30.30.33
Host is up (0.00035s latency).
PORT     STATE  SERVICE
5666/tcp closed nrpe

The route from the `monitor` machine to `10.30.30.33` on port 5666 is closed. I would suggest double checking your network configuration for this setup, both on the machines themselves and on any equipment between them.

It would also appear based on this output:

Code: Select all

[root@test-difin /]# netstat -anp |grep 5666

That there is no process on the `test-difin` machine that is listening on port 5666. Are you certain either the NRPE or xinetd daemon is running?

Re: Monitoring JMX

Posted: Mon May 14, 2018 11:57 am

by tgriep

Thanks @mcapra for the help.

With the output of the netstat command not showing that the NRPE agent is listening, that would cause the inability to run the check from the Nagios server.
Try reinstalling the NRPE agent on that remote server and see if that helps out.
The link below are the instruction for compiling NRPE from source.
https://support.nagios.com/kb/article.php?id=515