Fixing damaged and/or partial installs of Nagios

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
agenerette
Posts: 50
Joined: Wed Jul 25, 2012 5:09 pm

Re: Fixing damaged and/or partial installs of Nagios

Post by agenerette »

This is VERY confusing. It looks to me like all the commands are getting defined in /etc/nagios3/conf.d/commands.cfg. Beyond the code that I've added to that file for check_disk, do I also need to add the

command[check_disk]=/usr/lib64/nagios/plugins/check_disk $ARG1$

to /etc/nagios/nrpe.cfg? That's the only instance of nrpe.cfg that I find on my Nagios server. When ever someone speaks of making changes to files, it's often not clear which side the changes need to be made on: server or monitored node.

I just realized that this (from /etc/nagios3/conf.d/services.cfg) might be part of the problem, though:

define service {
use default-service
hostgroup_name portal
service_description Disk Space
check_command check_nrpe!check_disk
}

I'm thinking that maybe I didn't make all of the changes that eloyd recommended in his posting. So, I've just made the following additions:

# To /etc/nagios3/conf.d/commands.cfg...
define command {
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -a "$ARG2$"
}

#To /etc/nagios3/conf.d/services.cfg...
define service {
use nrpe-service
hostgroup_name portal
service_description Disk Space
check_command check_nrpe!check_disk!-w 20% -c 10% -p /
}

# To /etc/nagios/nrpe.cfg...
command[check_disk]=/usr/lib64/nagios/plugins/check_disk $ARG1$

On the Nagios server, I ran:
# service nagios3 restart; service nagios-nrpe-server restart

and on the monitored node (since it's running Amazon Linux), I ran:
# service nagios restart

The error displayed on the "Service Status" screen changed, somewhat (see attached). At least the command is now being recognized as having been defined. The last lines in /var/log/messages on the monitored node show:

Sep 5 16:50:34 ip-10-160-23-32 nrpe[11195]: Error: Request contained command arguments, but argument option is not enabled!
Sep 5 16:50:34 ip-10-160-23-32 nrpe[11195]: Client request was invalid, bailing out...

So, it looks like there's something wrong with the way I formatted one of those lines that I added. I'm gonna Google the error...
Attachments
Screen Shot 2014-09-05 at 9.48.27 AM.png
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Fixing damaged and/or partial installs of Nagios

Post by eloyd »

TLDR. NRPE is configured as follows (paths may vary based on how it was installed. My notes below assume you installed from source):

On the Nagios server, you will update /usr/local/nagios/etc/commands.cfg to include a check_nrpe command:

Code: Select all

define command{
        command_name    check_nrpe
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -a "$ARG2$"
}
You will update /usr/local/nagios/services.cfg to execute a command via the check_nrpe service check:

Code: Select all

define service{
        use                     nrpe-service
        service_description     Boot Partition
        hostgroups              private
        servicegroups           System,NRPE
        check_command           check_nrpe!check_disk!-w 20% -c 10% -p /boot
}
(Note that we use hostgroups to associate service checks, so you may need to update hostgroups to hostname)

On the CLIENT SIDE, you need to update /usr/local/nagios/etc/nrpe.cfg (or possibly /usr/local/nagios/etc/nrpe/<something.cfg> if you use includes) to include the check_disk command (using my example above):

Code: Select all

command[check_disk]=/usr/local/nagios/libexec/check_disk $ARG1$
Make sure NRPE is running or is accessible via xinetd and you're good to go.

Does that help with the overall concept of how NRPE works?
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
agenerette
Posts: 50
Joined: Wed Jul 25, 2012 5:09 pm

Re: Fixing damaged and/or partial installs of Nagios

Post by agenerette »

I believe I'm getting a clearer sense of how things work. Thanks. Working with a few more services will likely get me to where I'm good on it all.

The main problem that I'm seeing, now, is that the line in the attached screen-shot is showing for the monitored host that I'm currently focusing on. When I check the last few lines of that host's /var/log/messages, I see:

Sep 5 22:40:31 ip-10-177-5-132 nrpe[32478]: Error: Request contained command arguments, but argument option is not enabled!
Sep 5 22:40:31 ip-10-177-5-132 nrpe[32478]: Client request was invalid, bailing out...

So, it looks like there's something wrong with the way I formatted the lines in, perhaps, commands.cfg...


Server: /etc/nagios3/conf.d/commands.cfg:
define command {
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -a "$ARG2$"
}

Server: /etc/nagios3/conf.d/services.cfg:
define service {
use default-service
hostgroup_name portal
service_description Disk Space
# check_command check_nrpe!check_disk
check_command check_nrpe!check_disk!-w 20% -c 10% -p /
}

Monitored-node: /etc/nagios/nrpe.cfg:
command[check_disk]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /
Attachments
Screen Shot 2014-09-05 at 3.40.16 PM.png
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Fixing damaged and/or partial installs of Nagios

Post by eloyd »

It looks like it may be possible that your NRPE server is not accepting arguments. Try re-compiling it and doing this as the first step:

Code: Select all

./configure --enable-command-args
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
agenerette
Posts: 50
Joined: Wed Jul 25, 2012 5:09 pm

Re: Fixing damaged and/or partial installs of Nagios

Post by agenerette »

I'm wondering if it might be ok to simply call the command without arguments. I just added this to nrpe.cfg:

command[check_disk]=/usr/lib/nagios/plugins/check_disk /

On the problem hosts and, now, I'm seeing what's shown in the attached screen-shot. So, transform-staging is the only problem machine at this point. That VM hasn't been managed by Chef, up to now, and, consequently, its not getting certain iptables stuff needed to make basic connectivity between it and the Nagios server work. So, I should be able to take care of that pretty quickly.

I still need to sort out a problem that I'm having, though, with Chef. I've got an email out to the Support folks at opscode.com on it.

Thanks, for all of your help on this issue. I believe I'm close to having everything cleaned up.
Attachments
Screen Shot 2014-09-08 at 10.47.17 AM.png
User avatar
eloyd
Cool Title Here
Posts: 2129
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY
Contact:

Re: Fixing damaged and/or partial installs of Nagios

Post by eloyd »

You can certainly hard-code the disk to check, but if you need to monitor a new disk, then you need to make a new check. It's may be easier to set up the parameters capability, but of course it's entirely up to you.
Image
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Fixing damaged and/or partial installs of Nagios

Post by slansing »

Yep, as Eric said, the downside to that is you would now have to manually create separate commands for each disk check. The nice thing about arguments is that you can specify those values on the XI server on a host by host or service by service basis. Let us know how the Chef deal goes, it won't be hard to push out new configs to change that once it's back up and running properly.
Locked