Page 2 of 3

Re: Alternative to NRPE for Linux monitoring with Nagios Cor

Posted: Thu Jul 07, 2016 6:09 am
by neworderfac33
Good afternoon,

First of all, thanks to everyone who added - it's moving along well now... up to a point.

I added the following to the nrpe.cfg file:

Code: Select all

command[check_sda]=/usr/local/nagios/libexec/check_disk -w 15% -c 10% -p /dev/sda
command[check_sdb]=/usr/local/nagios/libexec/check_disk -w 15% -c 10% -p /dev/sdb 
I added the following to commands.cfg:

Code: Select all

Define command{
	command_name	check_nrpe
	command_line	$USER1$/check_nrpe –H $HOSTADDRESS$ -c $ARG1$
}
From the command prompt, the command /usr/local/nagios/libexec/check_nrpe -H 99.99.99.99 -c check_sda
returns:
DISK OK - free space: / 40586 MB (79% inode=92%);| /=10603MB;45839;48536;0;53929

For the purposes of testing, the monitoring checks are being carried out on the Linux host itself.
I have created a separate .cfg file (Linux.cfg) and referenced it in /usr/local/nagios/etc/nagios.cfg. It looks like this:

Code: Select all

define host {
    use                     linux-server
    host_name               MyServerID
    hostgroups              Linux-Build
    alias                   MyServerID
    address                 99.99.99.99
    register                1
}

define hostgroup{
        hostgroup_name  Linux-Build
        alias           Linux-Build
}

define service{
       use                     generic-service
       hostgroup_name          Linux-Build, UAT-Bet-Placement
       service_description     Drive Space - /dev/sda
       check_command           check_nrpe!check_sda
       }

define service{
       use                     generic-service
       hostgroup_name          Linux-Build, UAT-Bet-Placement
       service_description     Drive Space - /dev/sdb
       check_command           check_nrpe!check_sdb
       }
When I verify and restart Nagios, the host specified in Linux.cfg is correctly displayed in the web interface, but the status information shows:

Code: Select all

(No output on stdout) stderr: Could not resolve hostname -c: Name or service not known
The other services for check_load, check_users, check_total_procs, check_xombie_process are also specified in Linux.cfg (omitted here for ease of viewing), but they all show the same error message too.

Any suggestions as to where I might be going wrong, please?

Thanks in advance

Pete

Re: Alternative to NRPE for Linux monitoring with Nagios Cor

Posted: Thu Jul 07, 2016 9:49 am
by neworderfac33
Something which may relate to the above:
If I enter:

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_sda
or
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_sda
I get:

Code: Select all

DISK OK - free space: / 40585 MB (79% inode=92%);| /=10604MB;45839;48536;0;53929
but if I enter

Code: Select all

/usr/local/nagios/libexec/check_nrpe -H 99.99.99.99 -c check_sda
I get:

Code: Select all

CHECK_NRPE: Error - Could not complete SSL handshake.
Note that I am trying to monitor the Nagios server itself, but something is preventing it from talking to its own IP address, whereas if i specify it as localhost or 127.0.0.1, it works fine.

Cheers

Pete

Re: Alternative to NRPE for Linux monitoring with Nagios Cor

Posted: Thu Jul 07, 2016 11:46 am
by rkennedy
Just to clarify, are you trying to run check_nrpe from the Nagios machine against the client machine now at this point?

Take a look at your /etc/xinetd.d/nrpe file, does the only_from = section have the IP of the Nagios machine? This will be needed in order for it to be able to communicate.

If it's already there, take a look at the /var/log/messages file on the client machine, and see why it's rejecting the NRPE connection from your Nagios machine. Please post this for us to look at as well.

Re: Alternative to NRPE for Linux monitoring with Nagios Cor

Posted: Fri Jul 08, 2016 3:28 am
by neworderfac33
Yes, I'm attempting to monitor the Nagios Host FROM the Nagios Host
My file contains the following:
only_from = 127.0.0.1, 99.99.99.99
Where 99.99.99.99 is my Nagios Host.

Here's the output from the error log interspaced with what I actually typed in:
Jul 8 09:33:55 MyNagiosHost nagios: Event broker module 'NERD' deinitialized successfully.

/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -c check_sda
Jul 8 09:34:00 MyNagiosHost xinetd[23747]: START: nrpe pid=8644 from=127.0.0.1
Jul 8 09:34:00 MyNagiosHost xinetd[23747]: EXIT: nrpe status=0 pid=8644 duration=0(sec)

/usr/local/nagios/libexec/check_nrpe -H localhost -c check_sda
Jul 8 09:34:20 MyNagiosHost xinetd[23747]: START: nrpe pid=8649 from=127.0.0.1
Jul 8 09:34:20 MyNagiosHost xinetd[23747]: EXIT: nrpe status=0 pid=8649 duration=0(sec)

/usr/local/nagios/libexec/check_nrpe -H 99.99.99.99 -c check_sda
Jul 8 09:34:36 MyNagiosHost xinetd[23747]: START: nrpe pid=8656 from=99.99.99.99
Jul 8 09:34:36 MyNagiosHost xinetd[8656]: FAIL: nrpe address from=99.99.99.99
Jul 8 09:34:36 MyNagiosHost xinetd[23747]: EXIT: nrpe status=0 pid=8656 duration=0(sec)

Cheers
Pete

Re: Alternative to NRPE for Linux monitoring with Nagios Cor

Posted: Fri Jul 08, 2016 10:11 am
by rkennedy
The error indicates that it is not working. Have you restarted the service after making the change to your only_from field?

I'm confused how your NRPE is even starting, because with a , in the only_from field, NRPE will not start.

Code: Select all

Jul  8 10:07:05 localhost xinetd[3180]: Address: 127.0.0.1, has a comma in it - remove the comma [file=/etc/xinetd.d/nrpe] [line=15]
If you're checking the Nagios host from itself, 127.0.0.1 will work and also it's internal IP. Make sure you restart xinetd.d after making changes. Here are my settings for example -

/etc/xinetd.d/nrpe

Code: Select all

# default: on
# description: NRPE (Nagios Remote Plugin Executor)
service nrpe
{
        flags           = REUSE
        socket_type     = stream
        port            = 5666
        wait            = no
        user            = nagios
        group           = nagios
        server          = /usr/local/nagios/bin/nrpe
        server_args     = -c /usr/local/nagios/etc/nrpe.cfg --inetd
        log_on_failure  += USERID
        disable         = no
        only_from       = 127.0.0.1 192.168.4.179
}
ifconfig

Code: Select all

[root@localhost libexec]# ifconfig
eno33554952: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.4.179  netmask 255.255.0.0  broadcast 192.168.255.255
        inet6 fe80::250:56ff:fe84:cc8a  prefixlen 64  scopeid 0x20<link>
        ether 00:50:56:84:cc:8a  txqueuelen 1000  (Ethernet)
        RX packets 2438087  bytes 190701024 (181.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 227595  bytes 38038080 (36.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 129025  bytes 31631341 (30.1 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 129025  bytes 31631341 (30.1 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

tests

Code: Select all

[root@localhost libexec]# ./check_nrpe -H 127.0.0.1
NRPE v2.15
[root@localhost libexec]# ./check_nrpe -H 192.168.4.179
NRPE v2.15
/var/log/messages

Code: Select all

Jul  8 10:10:38 localhost xinetd[3859]: START: nrpe pid=4184 from=::ffff:127.0.0.1
Jul  8 10:10:38 localhost xinetd[3859]: EXIT: nrpe status=0 pid=4184 duration=0(sec)
Jul  8 10:10:41 localhost xinetd[3859]: START: nrpe pid=4190 from=::ffff:192.168.4.179
Jul  8 10:10:41 localhost xinetd[3859]: EXIT: nrpe status=0 pid=4190 duration=0(sec)

Re: Alternative to NRPE for Linux monitoring with Nagios Cor

Posted: Fri Jul 08, 2016 10:52 am
by neworderfac33
Thank you VERY much - it was the comma in the /etc/xinetd.d/nrpe file that was causing the problem.

I can now successfully issue the command from the prompt against the NAGIOS IP.

I'm still getting the same response from the front end though:
(No output on stdout) stderr: Could not resolve hostname -c: Name or service not known

Here's my Linux.cfg again, for reference, to save trawling back to the first page of the thread.

Code: Select all

define host {
    use                     linux-server
    host_name               MyServerID
    hostgroups              Linux-Build
    alias                   MyServerID
    address                 99.99.99.99
    register                1
}
define hostgroup{
        hostgroup_name  Linux-Build
        alias           Linux-Build
}
define hostgroup{
        hostgroup_name  UAT-Bet-Placement
        alias           UAT-Bet-Placement
}

define service{
       use                     generic-service
       hostgroup_name          Linux-Build, UAT-Bet-Placement
       service_description     Drive Space - /dev/sda
       check_command           check_nrpe!check_sda
       }
Other services are defined too, but if I can figure out why THIS one isn't working, the others should hopefully follow suit.

Here're the key contents of /usr/local/nagios/nrpe.cfg:

Code: Select all

log_facility=daemon
pid_file=/var/run/nrpe.pid
server_port=5666
nrpe_user=nagios
nrpe_group=nagios
allowed_hosts=127.0.0.1, 99.99.99.99 (I expect that it's THIS comma that I confused with the one in /etc/xinetd.d/nrpe!)
dont_blame_nrpe=0
allow_bash_command_substitution=0
debug=0
command_timeout=60
connection_timeout=300
command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_sda]=/usr/local/nagios/libexec/check_disk -w 15% -c 10% -p /dev/sda
command[check_sdb]=/usr/local/nagios/libexec/check_disk -w 15% -c 10% -p /dev/sdb
All these commands run successfully from the command prompt.

I don't suppose there's a connection between the "-c" in the error message:
(No output on stdout) stderr: Could not resolve hostname -c: Name or service not known

and the "-c" in /etc/xinetd.d/nrpe
server_args = -c /usr/local/nagios/etc/nrpe.cfg --inetd

is there? For info, here's the check_nrpe command as specified in /usr/local/nagios/etc/objects/commands.cfg

Code: Select all

define command{
        command_name    check_nrpe
        command_line    /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS -c $ARG1$
}

Thank you very much for your assistance and have a nice weekend when it gets to you.

Pete

Re: Alternative to NRPE for Linux monitoring with Nagios Cor

Posted: Fri Jul 08, 2016 11:19 am
by lmiltchev
You didn't add these services directly to the host with NRPE installed (99.99.99.99) but indirectly (via hostgroups). "MyServerID" host is a member of the "Linux-Build" hostgroup. What about "UAT-Bet-Placement"? Does this hostgroup have any members defined? Are NRPE & Nagios Plugins installed on these hosts? Are the "check_sda" and "check_sdb" commands defined?

Re: Alternative to NRPE for Linux monitoring with Nagios Cor

Posted: Fri Jul 08, 2016 11:21 am
by neworderfac33
Something that might be of relevance:
When i remove the -c $ARG1$ from the nrpe command in commands.cfg, the web interface reports the NRPE version number for EACH of the services!

command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS -c $ARG1$

Service Sort by service name (ascending)Sort by service name (descending) Status Sort by service status (ascending)Sort by service status (descending) Last Check Sort by last check time (ascending)Sort by last check time (descending)
MyServerID
CPU Load
OK 07-08-2016 17:18:31 0d 0h 5m 11s 1/3 NRPE v2.15
Current Users
OK 07-08-2016 17:14:29 0d 0h 4m 13s 1/3 NRPE v2.15
Drive Space - /dev/sda
OK 07-08-2016 17:16:30 0d 0h 2m 12s 1/3 NRPE v2.15
Drive Space - /dev/sdb
OK 07-08-2016 17:15:38 0d 0h 3m 4s 1/3 NRPE v2.15
Total Processes
OK 07-08-2016 17:18:31 0d 0h 5m 11s 1/3 NRPE v2.15
Zombie Processes
OK 07-08-2016 17:14:29 0d 0h 4m 13s 1/3 NRPE v2.15

Whilst this isn't what I want, at least it proves that the NAGIOS server is talking to itself (at this point in the day, I feel like I'm talking to myself too! :-))

Pete

Re: Alternative to NRPE for Linux monitoring with Nagios Cor

Posted: Fri Jul 08, 2016 11:32 am
by neworderfac33
For the time being, I have removed the references to the hostgroup and referenced the host directly within in the service definitions:

Code: Select all

define service{
       use                     generic-service
       host_name               MyServerID
       #hostgroup_name          Linux-Build, UAT-Bet-Placement
       service_description     Drive Space - /dev/sda
       check_command           check_nrpe!check_sda
       }
define service{
       use                     generic-service
       host_name               MyServerID
       #hostgroup_name          Linux-Build, UAT-Bet-Placement
       service_description     Drive Space - /dev/sdb
       check_command           check_nrpe!check_sdb
       }
define service{
        use                     generic-service
        host_name               MyServerID
        #hostgroup_name         Linux-Build, UAT-Bet-Placement
        service_description     CPU Load
        check_command           check_nrpe!check_load
}
define service{
        use                     generic-service
        host_name               MyServerID
        #hostgroup_name          Linux-Build, UAT-Bet-Placement
        service_description     Current Users
        check_command           check_nrpe!check_users
}
define service{
        use                     generic-service
        host_name               MyServerID
        #hostgroup_name          Linux-Build, UAT-Bet-Placement
        service_description     Total Processes
        check_command           check_nrpe!check_total_procs
}
define service{
        use                     generic-service
        host_name              MyServerID
        #hostgroup_name          Linux-Build, UAT-Bet-Placement
        service_description     Zombie Processes
        check_command           check_nrpe!check_zombie_process
}
But sadly, the error message in the browser is the same for each service: (No output on stdout) stderr: Could not resolve hostname -c: Name or service not known
The commands are defined in /usr/local/nagios/etc/nrpe.cfg

Code: Select all

command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200
command[check_sda]=/usr/local/nagios/libexec/check_disk -w 15% -c 10% -p /dev/sda
command[check_sdb]=/usr/local/nagios/libexec/check_disk -w 15% -c 10% -p /dev/sdb

Re: Alternative to NRPE for Linux monitoring with Nagios Cor

Posted: Fri Jul 08, 2016 11:37 am
by neworderfac33
I am reluctant to finish for the day given that with your help I'm making such excellent progress, but i am on driving duties for a 70th birthday party tonight! :-)
So, if I don't respond to any subsequent replies from you until Monday, I'm not being ignorant, I'm just not here, although I may try to have a look from home over the weekend.

Thanks again for your help
Pete