monitoring linux disks
-
- Posts: 34
- Joined: Wed Jan 09, 2013 5:11 pm
monitoring linux disks
I have a number of linux server (RHEL 6.2) that I'm trying to monitor, predominantly using NRPE. Although there is a canned service definition to monitor the local /dev/hda1 drive, its not useful for my purposes, as I have sda drives and I'd much prefer to monitor mounted filesystems (due to using LVM). I did some research and tried to use the check_disk command, verified that the check_disk plugin in located in the /usr/local/nagios/libexec directory on both the Nagios server and the monitored linux server, and setup the following on the Nagios server:
a definition in commands.cfg:
# 'check_disk' command definition
define command{
command_name check_disk
command_line $USER1$/check_disk -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p $ARG3$
}
and then setup a service definition in my linux.cfg file (service configs for all my linux servers):
define service{
use generic-service
host_name lmprodwa1.lmgroup.com
service_description Root Partition Free Space
check_command i/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /
}
and on the monitored linux server, I setup the following in my nrpe.cfg file:
command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
after I restart my Nagios service and refresh the console GUI, I keep getting the same Unknown status and the message "Incorrect command line arguments supplied" (see attached screen capture).
although the check_disk info states that you can either use the explicit path to the disk (eg. /dev/sda1), you can also use the mounted filesystem name (eg. /boot). the / partition that I'd like to monitor is actually /dev/mapper/vg_octemplate-lv logical volume. is that what the issue is? that check_disk can't monitor logical volumes? or is it a misconfiguration in one of my settings? any ideas?
thanks in advance,
Jeremy
a definition in commands.cfg:
# 'check_disk' command definition
define command{
command_name check_disk
command_line $USER1$/check_disk -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p $ARG3$
}
and then setup a service definition in my linux.cfg file (service configs for all my linux servers):
define service{
use generic-service
host_name lmprodwa1.lmgroup.com
service_description Root Partition Free Space
check_command i/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /
}
and on the monitored linux server, I setup the following in my nrpe.cfg file:
command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
after I restart my Nagios service and refresh the console GUI, I keep getting the same Unknown status and the message "Incorrect command line arguments supplied" (see attached screen capture).
although the check_disk info states that you can either use the explicit path to the disk (eg. /dev/sda1), you can also use the mounted filesystem name (eg. /boot). the / partition that I'd like to monitor is actually /dev/mapper/vg_octemplate-lv logical volume. is that what the issue is? that check_disk can't monitor logical volumes? or is it a misconfiguration in one of my settings? any ideas?
thanks in advance,
Jeremy
Re: monitoring linux disks
If your post was an exact copy/paste, you may have a problem with:
The last line "check_command i/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /" has an errant "i", happens to the best of us vim users.
Also, you need to use nrpe. Sam will get you up to speed.
Code: Select all
define service{
use generic-service
host_name lmprodwa1.lmgroup.com
service_description Root Partition Free Space
check_command i/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /
}
Also, you need to use nrpe. Sam will get you up to speed.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: monitoring linux disks
You need to pass the check through NRPE check_disk is a local command, so you must change your command definition to reflect that by not using /check_disk, but rather /check_nrpe $HOSTADDRESS$ -c check_disk "Add the rest of the information here, -c is the check flag whcih calls the check on the remote system."
And for check command in the service definition call the name of the command you used for check disk as you did in your previous thread. You do not need re add the directory path and what not.
Please see the following link for more information:
http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf
And for check command in the service definition call the name of the command you used for check disk as you did in your previous thread. You do not need re add the directory path and what not.
Please see the following link for more information:
http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf
-
- Posts: 34
- Joined: Wed Jan 09, 2013 5:11 pm
Re: monitoring linux disks
abrist: LOLOLOL yeah, that's totally my typo!! I've changed the service and command config files so many times today, that's just the most recent issue!
slangsing: Thanks for the info, and please pardon my Nagios ignorance... : )
so what you're suggesting I do is to change the commands.cfg file on the Nagios server to:
# 'check_disk' command definition
define command{
command_name check_disk
command_line $USER1$/check_nrpe -c check_disk -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p $ARG3$
}
and the linux.cfg file (linux server configuration file) on the Nagios server to:
# 'check_disk' command definition
define service{
use generic-service
host_name lmprodwa1.lmgroup.com
service_description Root Partition Free Space
check_command check_nrpe!check_disk -w 20% -c 10% -p /
}
and the nrpe.cfg file on the monitored linux server to include:
command[check_disk]=/usr/local/nagios/libexec/check_procs -w 20% -c 10% -p /
did I understand you correctly or am I still way off base?
slangsing: Thanks for the info, and please pardon my Nagios ignorance... : )
so what you're suggesting I do is to change the commands.cfg file on the Nagios server to:
# 'check_disk' command definition
define command{
command_name check_disk
command_line $USER1$/check_nrpe -c check_disk -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p $ARG3$
}
and the linux.cfg file (linux server configuration file) on the Nagios server to:
# 'check_disk' command definition
define service{
use generic-service
host_name lmprodwa1.lmgroup.com
service_description Root Partition Free Space
check_command check_nrpe!check_disk -w 20% -c 10% -p /
}
and the nrpe.cfg file on the monitored linux server to include:
command[check_disk]=/usr/local/nagios/libexec/check_procs -w 20% -c 10% -p /
did I understand you correctly or am I still way off base?
-
- Posts: 34
- Joined: Wed Jan 09, 2013 5:11 pm
Re: monitoring linux disks
update: so I took the below information (or what I think I understood from it) and now when I attempt to run prelight before restaring my Nagios server, I get the following error:
Error: Service check command 'check_nrpe -c check_disk -w 20% -c 10% -p /' specified in service 'Root Partition Free Space' for host 'lmprodwa1.lmgroup.com' not defined anywhere!
I'm still doing something wrong... sigh!
Error: Service check command 'check_nrpe -c check_disk -w 20% -c 10% -p /' specified in service 'Root Partition Free Space' for host 'lmprodwa1.lmgroup.com' not defined anywhere!
I'm still doing something wrong... sigh!
Re: monitoring linux disks
In the service definition:
last line should be: [/b]
In the nrpe.cfg file:
should be:
Also, make sure you have:
set in your remote host's nrpe.cfg file and:
set in your nagios server's nagios.cfg file.
Code: Select all
# 'check_disk' command definition
define service{
use generic-service
host_name lmprodwa1.lmgroup.com
service_description Root Partition Free Space
check_command check_nrpe!check_disk -w 20% -c 10% -p /
}
Code: Select all
check_command check_disk!20%!10%!/
In the nrpe.cfg file:
Code: Select all
command[check_disk]=/usr/local/nagios/libexec/check_procs -w 20% -c 10% -p /
Code: Select all
command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
Code: Select all
dont_blame_nrpe=1
Code: Select all
check_external_commands=1
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: monitoring linux disks
One more thing:
Should be:
Code: Select all
# 'check_disk' command definition
define command{
command_name check_disk
command_line $USER1$/check_nrpe -c check_disk -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p $ARG3$
}
Code: Select all
# 'check_disk' command definition
define command{
command_name check_disk
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
}
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
- Posts: 34
- Joined: Wed Jan 09, 2013 5:11 pm
Re: monitoring linux disks
abrist, you rock and thank you for all that concisely worded info... but it didn't work. I made every change exactly as you listed them, and although now I don't get any preflight errors when I test the config, I still get an "Incorrect command line arguments" error in the Nagios console display (Status Information column), as per the attached screen capture. Anything else that I can check/reconfigure?
Re: monitoring linux disks
Hi jeremy.garman,
Personally, I couldn't get NRPE working using $ARG#$ in nrpe.cfg.
What I do is add the arguments into the nrpe.cfg on the remote machine.
So to do this you would change the check_disk in nrpe.cfg on the remote machine to the following:
Then on the nagios server edit the command definition in commands.cfg to:
Then change the service definition to:
This should work for now, but the problem is changes to the warning/critical values must be done on each server's nrpe.cfg file, and new partitions to monitor must be added there too.
Hope this helps.
Kinds Regards,
Gary Shergill
Personally, I couldn't get NRPE working using $ARG#$ in nrpe.cfg.
What I do is add the arguments into the nrpe.cfg on the remote machine.
So to do this you would change the check_disk in nrpe.cfg on the remote machine to the following:
Code: Select all
command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /
Code: Select all
# 'check_disk' command definition
define command{
command_name check_disk
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c check_disk
}
Code: Select all
# 'check_disk' command definition
define service{
use generic-service
host_name lmprodwa1.lmgroup.com
service_description Root Partition Free Space
check_command check_disk
}
Hope this helps.
Kinds Regards,
Gary Shergill
Re: monitoring linux disks
Try running you command from the command line on the remote system first, then on the nagios server.
On the remote host:
On the Nagios Server
If all checks out, then you must have a syntax error in your setup.
gshergill's method is certainly valid. It is actually preferred for environments where security is an issue as you can turn off the external commands. You could also try wrapping the entire list of arguments into one $ARG1$:
In the service definition:
In the nrpe.cfg file:
commands.cfg definition
The switch "-a" allows for a full sting to be passed as an argument from the service definition.
Could you post your current service and host cfg, your commands.cfg, and your nrpe.cfg?
On the remote host:
Code: Select all
cd /usr/local/nagios/libexec
./check_disk -w 20% -c 10% -p /
Code: Select all
cd /usr/local/nagios/libexec
./check_nrpe -H lmprodwa1.lmgroup.com -c check_disk -a '-w 20% -c 10% -p /'
gshergill's method is certainly valid. It is actually preferred for environments where security is an issue as you can turn off the external commands. You could also try wrapping the entire list of arguments into one $ARG1$:
In the service definition:
Code: Select all
define service{
use generic-service
host_name lmprodwa1.lmgroup.com
service_description Root Partition Free Space
check_command check_disk! -w 20% -c 10% -p /
}
Code: Select all
command[check_disk]=/usr/local/nagios/libexec/check_disk -a $ARG1$
Code: Select all
define command{
command_name check_disk
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_disk -a $ARG1$
}
Could you post your current service and host cfg, your commands.cfg, and your nrpe.cfg?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.