Page 2 of 3
Re: Fixing damaged and/or partial installs of Nagios
Posted: Wed Sep 03, 2014 1:31 pm
by agenerette
No, the only change that I've made, in about 6 months is to add the apt-if-appropriate code, for installing apt only on Debian/Ubuntu VMs.
Do you happen to know what steps I would need to follow to get the check_disk command setup? If I can just get that down to a manual process, I'll be able to add it to my Chef config.
The primary problem, at this point, is that "NRPE: Command 'check_disk' not defined" that's showing on a number of the nodes (see the attached screen-shot). The timeout and ssh-handshake alerts, I believe I'll be able to take care of.
-Anthony
Re: Fixing damaged and/or partial installs of Nagios
Posted: Wed Sep 03, 2014 1:35 pm
by eloyd
"check_disk" is a pretty standard NRPE check.
In commands.cfg:
Code: Select all
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -a "$ARG2$"
}
In services.cfg:
Code: Select all
define service{
use nrpe-service
service_description Root Partition
hostgroups private
servicegroups System,NRPE
check_command check_nrpe!check_disk!-w 20% -c 10% -p /
}
(Note, we use hostgroups to associate service checks, so you may want to use a hostname instead.)
On the client, somewhere in nrpe.cfg:
Code: Select all
command[check_disk]=/usr/local/nagios/libexec/check_disk $ARG1$
Re: Fixing damaged and/or partial installs of Nagios
Posted: Wed Sep 03, 2014 2:16 pm
by agenerette
After looking over your last post, I ran this, on the Nagios server:
root@ip-10-244-20-90:/etc# grep -R check_nrpe *
nagios3/conf.d/commands.cfg: command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_nagios -t 20
nagios3/conf.d/commands.cfg: command_name check_nrpe_alive
nagios3/conf.d/commands.cfg: command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 20
nagios3/conf.d/commands.cfg: command_name check_nrpe
nagios3/conf.d/commands.cfg: command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t 20
nagios3/conf.d/services.cfg:# check_command check_nrpe!check_smtp
nagios3/conf.d/services.cfg: check_command check_nrpe!check_disk
nagios3/conf.d/services.cfg: check_command check_nrpe!check_disk
nagios3/conf.d/services.cfg.bak:# check_command check_nrpe!check_smtp
nagios-plugins/config/check_nrpe.cfg: command_name check_nrpe
nagios-plugins/config/check_nrpe.cfg: command_line /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$
nagios-plugins/config/check_nrpe.cfg: command_name check_nrpe_1arg
nagios-plugins/config/check_nrpe.cfg: command_line /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
So, I started looking through the commands.cfg and services.cfg files... It looks like the check_nrpe command and two check_disk services are already in place. On the portal-production node, though, the directory /usr/local/nagios/libexec/ doesn't exist. On that machine, I find:
[root@ip-10-160-23-32 objects]# find / -name check_disk -print
/var/chef/cache/nagios-plugins-1.4.16/plugins/check_disk
/usr/lib64/nagios/plugins/check_disk
[root@ip-10-160-23-32 objects]#
I tried adding
command[check_disk]=/usr/lib64/nagios/plugins/check_disk $ARG1$
At the end of /etc/nagios/nrpe.cfg on the portal-production node and restarting all of the services, but this doesn't appear to have helped.
-Anthony
Re: Fixing damaged and/or partial installs of Nagios
Posted: Wed Sep 03, 2014 2:41 pm
by eloyd
Is NRPE running on the remote host?
Code: Select all
netstat -na | grep 5666
or
lsof -i:5666
Re: Fixing damaged and/or partial installs of Nagios
Posted: Wed Sep 03, 2014 4:47 pm
by agenerette
Yeah, from portal-production (the monitored node), I get:
[root@ip-10-160-23-32 ~]# netstat -na | grep 5666
tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN
tcp 0 0 :::5666
And from the Nagios server, I get:
root@ip-10-244-20-90:~# /usr/lib/nagios/plugins/check_nrpe -H 50.18.168.192
NRPE v2.15
Re: Fixing damaged and/or partial installs of Nagios
Posted: Thu Sep 04, 2014 7:49 am
by eloyd
On the remote host, what happens when you log in as nagios (or su to nagios) and type:
Code: Select all
/usr/lib64/nagios/plugins/check_disk !-w 20% -c 10% -p /
Re: Fixing damaged and/or partial installs of Nagios
Posted: Thu Sep 04, 2014 11:10 am
by agenerette
I get:
[nagios@ip-10-160-23-32 ~]$ /usr/lib64/nagios/plugins/check_disk !-w 20% -c 10% -p /
/usr/lib64/nagios/plugins/check_disk which ohai 20% -c 10% -p /
DISK CRITICAL - which is not accessible: No such file or directory
But,
[root@ip-10-160-23-32 ~]# which ohai
/usr/bin/ohai
Re: Fixing damaged and/or partial installs of Nagios
Posted: Thu Sep 04, 2014 11:14 am
by eloyd
Whoops! I had an extra character in there. Try this:
Code: Select all
/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /\
without the !

Re: Fixing damaged and/or partial installs of Nagios
Posted: Thu Sep 04, 2014 11:29 am
by agenerette
Oh, I saw your comment after I posted that edit to my last...
[root@ip-10-160-23-32 ~]# /usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /
DISK OK - free space: / 5204 MB (65% inode=79%);| /=2777MB;6450;7256;0;8063
So, I just started looking around in /etc/nagios3/conf.d, on the Nagios server. The fact that there are /etc/nagios and /etc/nagios3 directories on that machine has caused me some confusion, up to now. .../nagios3/conf.d/commands.cfg, though, seems to be the place where check_disk needs to be defined, but I saw no code there for the command. So, I added the following code to the file and restarted the Nagios services on both server and node:
define command {
command_name check_disk
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_disk -a '-w 10% -c 5% -p /'
}
Again, 'no luck.
Re: Fixing damaged and/or partial installs of Nagios
Posted: Fri Sep 05, 2014 10:50 am
by sreinhardt
At this point, it looks like you just need to reconfigure the nrpe side of things. Nagios knows about your nrpe systems, is trying to communicate, but getting stopped because of two main things. The commands are not defined, at least in the configs shown, for nrpe to know how to execute check_disk and others. Second possible issue, is the reference of /usr/lib/ instead of /usr/lib64 as it appears you need to use.
To clarify some on the directories you are seeing, most of the time nagios3 is from an ubuntu\debian rpm install of nagios core. The /etc/nagios dir, could be either nrpe from rpm, a source install of any number of nagios products, or another version of core installed on that system.
To define the command for the nrpe system with plugins in lib64, you nrpe command would look like:
Code: Select all
command[check_disk]=/usr/lib64/nagios/plugins/check_disk $ARG1$
After adding that, restart xinetd or the nrpe daemon, not sure if you are using xinetd or not at this point, and you should be all set to run an immediate check from nagios to get some results!