Page 2 of 3

Re: Fixing damaged and/or partial installs of Nagios

Posted: Wed Sep 03, 2014 1:31 pm
by agenerette
No, the only change that I've made, in about 6 months is to add the apt-if-appropriate code, for installing apt only on Debian/Ubuntu VMs.

Do you happen to know what steps I would need to follow to get the check_disk command setup? If I can just get that down to a manual process, I'll be able to add it to my Chef config.

The primary problem, at this point, is that "NRPE: Command 'check_disk' not defined" that's showing on a number of the nodes (see the attached screen-shot). The timeout and ssh-handshake alerts, I believe I'll be able to take care of.

-Anthony

Re: Fixing damaged and/or partial installs of Nagios

Posted: Wed Sep 03, 2014 1:35 pm
by eloyd
"check_disk" is a pretty standard NRPE check.

In commands.cfg:

Code: Select all

define command{
        command_name    check_nrpe
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -a "$ARG2$"
}
In services.cfg:

Code: Select all

define service{
        use                     nrpe-service
        service_description     Root Partition
        hostgroups              private
        servicegroups           System,NRPE
        check_command           check_nrpe!check_disk!-w 20% -c 10% -p /
}
(Note, we use hostgroups to associate service checks, so you may want to use a hostname instead.)

On the client, somewhere in nrpe.cfg:

Code: Select all

command[check_disk]=/usr/local/nagios/libexec/check_disk $ARG1$

Re: Fixing damaged and/or partial installs of Nagios

Posted: Wed Sep 03, 2014 2:16 pm
by agenerette
After looking over your last post, I ran this, on the Nagios server:

root@ip-10-244-20-90:/etc# grep -R check_nrpe *
nagios3/conf.d/commands.cfg: command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_nagios -t 20
nagios3/conf.d/commands.cfg: command_name check_nrpe_alive
nagios3/conf.d/commands.cfg: command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 20
nagios3/conf.d/commands.cfg: command_name check_nrpe
nagios3/conf.d/commands.cfg: command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t 20
nagios3/conf.d/services.cfg:# check_command check_nrpe!check_smtp
nagios3/conf.d/services.cfg: check_command check_nrpe!check_disk
nagios3/conf.d/services.cfg: check_command check_nrpe!check_disk
nagios3/conf.d/services.cfg.bak:# check_command check_nrpe!check_smtp
nagios-plugins/config/check_nrpe.cfg: command_name check_nrpe
nagios-plugins/config/check_nrpe.cfg: command_line /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$
nagios-plugins/config/check_nrpe.cfg: command_name check_nrpe_1arg
nagios-plugins/config/check_nrpe.cfg: command_line /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

So, I started looking through the commands.cfg and services.cfg files... It looks like the check_nrpe command and two check_disk services are already in place. On the portal-production node, though, the directory /usr/local/nagios/libexec/ doesn't exist. On that machine, I find:

[root@ip-10-160-23-32 objects]# find / -name check_disk -print
/var/chef/cache/nagios-plugins-1.4.16/plugins/check_disk
/usr/lib64/nagios/plugins/check_disk
[root@ip-10-160-23-32 objects]#

I tried adding

command[check_disk]=/usr/lib64/nagios/plugins/check_disk $ARG1$

At the end of /etc/nagios/nrpe.cfg on the portal-production node and restarting all of the services, but this doesn't appear to have helped.

-Anthony

Re: Fixing damaged and/or partial installs of Nagios

Posted: Wed Sep 03, 2014 2:41 pm
by eloyd
Is NRPE running on the remote host?

Code: Select all

netstat -na | grep 5666
or
lsof -i:5666

Re: Fixing damaged and/or partial installs of Nagios

Posted: Wed Sep 03, 2014 4:47 pm
by agenerette
Yeah, from portal-production (the monitored node), I get:
[root@ip-10-160-23-32 ~]# netstat -na | grep 5666
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN
tcp 0 0 :::5666


And from the Nagios server, I get:
root@ip-10-244-20-90:~# /usr/lib/nagios/plugins/check_nrpe -H 50.18.168.192
NRPE v2.15

Re: Fixing damaged and/or partial installs of Nagios

Posted: Thu Sep 04, 2014 7:49 am
by eloyd
On the remote host, what happens when you log in as nagios (or su to nagios) and type:

Code: Select all

/usr/lib64/nagios/plugins/check_disk !-w 20% -c 10% -p /

Re: Fixing damaged and/or partial installs of Nagios

Posted: Thu Sep 04, 2014 11:10 am
by agenerette
I get:

[nagios@ip-10-160-23-32 ~]$ /usr/lib64/nagios/plugins/check_disk !-w 20% -c 10% -p /
/usr/lib64/nagios/plugins/check_disk which ohai 20% -c 10% -p /
DISK CRITICAL - which is not accessible: No such file or directory

But,

[root@ip-10-160-23-32 ~]# which ohai
/usr/bin/ohai

Re: Fixing damaged and/or partial installs of Nagios

Posted: Thu Sep 04, 2014 11:14 am
by eloyd
Whoops! I had an extra character in there. Try this:

Code: Select all

/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /\
without the ! :-)

Re: Fixing damaged and/or partial installs of Nagios

Posted: Thu Sep 04, 2014 11:29 am
by agenerette
Oh, I saw your comment after I posted that edit to my last...

[root@ip-10-160-23-32 ~]# /usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /
DISK OK - free space: / 5204 MB (65% inode=79%);| /=2777MB;6450;7256;0;8063


So, I just started looking around in /etc/nagios3/conf.d, on the Nagios server. The fact that there are /etc/nagios and /etc/nagios3 directories on that machine has caused me some confusion, up to now. .../nagios3/conf.d/commands.cfg, though, seems to be the place where check_disk needs to be defined, but I saw no code there for the command. So, I added the following code to the file and restarted the Nagios services on both server and node:

define command {
command_name check_disk
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_disk -a '-w 10% -c 5% -p /'
}

Again, 'no luck.

Re: Fixing damaged and/or partial installs of Nagios

Posted: Fri Sep 05, 2014 10:50 am
by sreinhardt
At this point, it looks like you just need to reconfigure the nrpe side of things. Nagios knows about your nrpe systems, is trying to communicate, but getting stopped because of two main things. The commands are not defined, at least in the configs shown, for nrpe to know how to execute check_disk and others. Second possible issue, is the reference of /usr/lib/ instead of /usr/lib64 as it appears you need to use.

To clarify some on the directories you are seeing, most of the time nagios3 is from an ubuntu\debian rpm install of nagios core. The /etc/nagios dir, could be either nrpe from rpm, a source install of any number of nagios products, or another version of core installed on that system.

To define the command for the nrpe system with plugins in lib64, you nrpe command would look like:

Code: Select all

command[check_disk]=/usr/lib64/nagios/plugins/check_disk $ARG1$
After adding that, restart xinetd or the nrpe daemon, not sure if you are using xinetd or not at this point, and you should be all set to run an immediate check from nagios to get some results!