Page 2 of 3
Re: monitoring linux disks
Posted: Fri Jan 11, 2013 12:46 pm
by jeremy.garman
thanks to both gshergill and arbrist for the responses. I'm going to try them this afternoon and reply to the forum, but I just wanted to post the even log entry pertaining to this issue, and maybe it will shed additional light?
"PROBLEM Service Alert: lmprodwa1/Root Partition Free Space is UNKNOWN **" nagios@localhost" resulted in a return code of 127. Make sure the script or binary you are trying to execute actually exists..."
Re: monitoring linux disks
Posted: Fri Jan 11, 2013 2:41 pm
by abrist
Let us know what the output was from the running the commands from the cli, we will move from there.
Re: monitoring linux disks
Posted: Mon Jan 14, 2013 2:20 pm
by jeremy.garman
gshergill: after actually reading your post, I'm only afraid that this adds a large amount of work to being able to configure additional partitions on that linux server (or am I totally out to lunch?).
If I modify the check_disk command itself, rather than passing a specific check_disk command with associated partition to the check_nrpe command, aren't I limiting myself to the explicitly listed partition?
abrist: when I ran the check_disk command on the linux server, it worked fine. when I ran the check_nrpe command on the nagios server, I got the following error message:
CHECK_NRPE: Received 0 bytes from daemon. Check the remote server logs for error messages
when I look at the linux server logs, this is what's associated with the manual test:
nrpe[30526]: Error: Request contained command arguments!
nrpe[30526]: Client request was invalid, bailing out...
xinetd[20707]: EXIT: nrpe status=0 pid=30526 duration=0(sec)
which config file has the bad arguments? I'll post all three that you asked for in my next post...
Re: monitoring linux disks
Posted: Mon Jan 14, 2013 2:49 pm
by jeremy.garman
OK, here's the contents of the linx server's nrpe.cfg file (I've cut out all the commented out components, so this is just the live config):command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 180 -c 200
command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
Here's the contents of the Nagios server's linux.cfg file (also, only the uncommented out, live config):# Individual Host Definitions:
define host{
use linux-server ; Inherit default values from a template
host_name lmprodwa1.lmgroup.com ; The name we're giving to this server
alias lmprodwa1 ; A longer name for the server
address 192.168.115.69 ; IP address of the server
}
define host{
use linux-server ; Inherit default values from a template
host_name lmprodwa2.lmgroup.com ; The name we're giving to this server
alias lmprodwa2 ; A longer name for the server
address 192.168.115.70 ; IP address of the server
}
define host{
use linux-server ; Inherit default values from a template
host_name ocqadb1.lmgroup.com ; The name we're giving to this server
alias ocqadb1 ; A longer name for the server
address 192.168.115.52 ; IP address of the server
}
define host{
use linux-server ; Inherit default values from a template
host_name ocqadb2.lmgroup.com ; The name we're giving to this server
alias ocqadb2 ; A longer name for the server
address 192.168.115.53 ; IP address of the server
}
# Service Definitions:
define service{
use generic-service
host_name lmprodwa1.lmgroup.com
service_description OS Boot Partition Free Space
check_command check_disk!20%!10%!/boot
}
define service{
use generic-service
host_name lmprodwa1.lmgroup.com
service_description Root Partition Free Space
check_command check_disk!20%!10%!/
}
define service{
use generic-service
host_name lmprodwa2.lmgroup.com
service_description /dev/hda1 Free Space
check_command check_nrpe!check_hda1
}
define service{
use generic-service
host_name ocqadb1.lmgroup.com
service_description /dev/hda1 Free Space
check_command check_nrpe!check_hda1
}
define service{
use generic-service
host_name ocqadb2.lmgroup.com
service_description /dev/hda1 Free Space
check_command check_nrpe!check_hda1
}
# Number of Total Processes:
define service{
use generic-service
host_name lmprodwa1.lmgroup.com
service_description Total Processes
check_command check_nrpe!check_total_procs
}
define service{
use generic-service
host_name lmprodwa2.lmgroup.com
service_description Total Processes
check_command check_nrpe!check_total_procs
}
define service{
use generic-service
host_name ocqadb1.lmgroup.com
service_description Total Processes
check_command check_nrpe!check_total_procs
}
define service{
use generic-service
host_name ocqadb2.lmgroup.com
service_description Total Processes
check_command check_nrpe!check_total_procs
}
Here's the contents of the Nagios server's commands.cfg file (also, only the uncommented out, live config):
# 'notify-host-by-email' command definition
define command{
command_name notify-host-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/bin/sendmail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" $CONTACTEMAIL$
}
# 'notify-service-by-email' command definition
define command{
command_name notify-service-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$\n" | /usr/bin/sendmail -s "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
}
# 'check-host-alive' command definition
define command{
command_name check-host-alive
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
}
# 'check_local_disk' command definition
define command{
command_name check_local_disk
command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
}
# 'check_local_load' command definition
define command{
command_name check_local_load
command_line $USER1$/check_load -w $ARG1$ -c $ARG2$
}
# 'check_local_procs' command definition
define command{
command_name check_local_procs
command_line $USER1$/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
}
# 'check_local_users' command definition
define command{
command_name check_local_users
command_line $USER1$/check_users -w $ARG1$ -c $ARG2$
}
# 'check_local_swap' command definition
define command{
command_name check_local_swap
command_line $USER1$/check_swap -w $ARG1$ -c $ARG2$
}
# 'check_local_mrtgtraf' command definition
define command{
command_name check_local_mrtgtraf
command_line $USER1$/check_mrtgtraf -F $ARG1$ -a $ARG2$ -w $ARG3$ -c $ARG4$ -e $ARG5$
}
# NOTE: The following 'check_...' commands are used to monitor services on
# both local and remote hosts.
# 'check_nrpe' command definition
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
# 'check_disk' command definition
define command{
command_name check_disk
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
}
# 'check_ftp' command definition
define command{
command_name check_ftp
command_line $USER1$/check_ftp -H $HOSTADDRESS$ $ARG1$
}
# 'check_hpjd' command definition
define command{
command_name check_hpjd
command_line $USER1$/check_hpjd -H $HOSTADDRESS$ $ARG1$
}
# 'check_snmp' command definition
define command{
command_name check_snmp
command_line $USER1$/check_snmp -H $HOSTADDRESS$ $ARG1$
}
# 'check_http' command definition
define command{
command_name check_http
command_line $USER1$/check_http -I $HOSTADDRESS$ $ARG1$
}
# 'check_ssh' command definition
define command{
command_name check_ssh
command_line $USER1$/check_ssh $ARG1$ $HOSTADDRESS$
}
# 'check_dhcp' command definition
define command{
command_name check_dhcp
command_line $USER1$/check_dhcp $ARG1$
}
# 'check_ping' command definition
define command{
command_name check_ping
command_line $USER1$/check_ping -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 5
}
# 'check_pop' command definition
define command{
command_name check_pop
command_line $USER1$/check_pop -H $HOSTADDRESS$ $ARG1$
}
# 'check_imap' command definition
define command{
command_name check_imap
command_line $USER1$/check_imap -H $HOSTADDRESS$ $ARG1$
}
# 'check_smtp' command definition
define command{
command_name check_smtp
command_line $USER1$/check_smtp -H $HOSTADDRESS$ $ARG1$
}
# 'check_tcp' command definition
define command{
command_name check_tcp
command_line $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
}
# 'check_udp' command definition
define command{
command_name check_udp
command_line $USER1$/check_udp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$
}
# 'check_nt' command definition
define command{
command_name check_nt
command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -v $ARG1$ $ARG2$
}
# 'process-host-perfdata' command definition
define command{
command_name process-host-perfdata
command_line /usr/bin/printf "%b" "$LASTHOSTCHECK$\t$HOSTNAME$\t$HOSTSTATE$\t$HOSTATTEMPT$\t$HOSTSTATETYPE$\t$HOSTEXECUTIONTIME$\t$HOSTOUTPUT$\t$HOSTPERFDATA$\n" >> /usr/local/nagios/var/host-perfdata.out
}
# 'process-service-perfdata' command definition
define command{
command_name process-service-perfdata
command_line /usr/bin/printf "%b" "$LASTSERVICECHECK$\t$HOSTNAME$\t$SERVICEDESC$\t$SERVICESTATE$\t$SERVICEATTEMPT$\t$SERVICESTATETYPE$\t$SERVICEEXECUTIONTIME$\t$SERVICELATENCY$\t$SERVICEOUTPUT$\t$SERVICEPERFDATA$\n" >> /usr/local/nagios/var/service-perfdata.out
}
Re: monitoring linux disks
Posted: Mon Jan 14, 2013 3:06 pm
by abrist
Did you compile nrpe with command-args enabled?
Also, make sure you have:
In /usr/local/nagios/etc/nrpe.cfg on the checked host:
In /usr/local/nagios/etc/nagios.cfg on the nagios server:
Try these on the Nagios Server:
Code: Select all
# 'check_disk' command definition
define command{
command_name check_disk
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_disk $ARG1$
}
Code: Select all
# Service Definitions:
define service{
use generic-service
host_name lmprodwa1.lmgroup.com
service_description Root Partition Free Space
check_command check_disk! -w 20% -c 10% -p /
}
And this on the checked host:
Code: Select all
command[check_disk]=/usr/local/nagios/libexec/check_disk $ARG1$
Setting up the commands this way allows you to pass the whole set of $ARG$s as 1 string.
Re: monitoring linux disks
Posted: Tue Jan 15, 2013 5:52 am
by gshergill
Hi jeremy.garman,
I agree, my way is very limited.
To get this working I had to reconfigure my nrpe install on the Remote Server.
Personally, I encountered no problems when I tried to reconfigure nrpe on an existing install, but there may be problems I am unaware of, so it might be worth waiting for confirmation of some sorts (I'm just a Nagios user like yourself) that my method won't do something unexpected to NRPE.
The following are the steps I did:
cd into the nrpe download directory (with the make files,etc.).
Configure with command-args:
Remake some files:
Code: Select all
make all
make install-plugin
make install-daemon
Make a backup of your nrpe.cfg then reinstall the config:
Code: Select all
cp /usr/local/nagios/etc/nrpe.cfg ~/old-nrpe.cfg
make install-daemon-config
Edit the new nrpe.cfg based on your changes and set dont_blame_nrpe to 1:
That should get the nrpe working with command arguments.
Setup of checks are slightly different to abrist's suggestion but I did:
in nrpe.cfg:
Code: Select all
command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
in commands.cfg:
Code: Select all
# 'check_nrpe_args' command definition
define command {
command_name check_nrpe_args
command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ -a $ARG2$
}
in services.cfg:
Code: Select all
define service {
use generic-service
hosts grindervms
service_description Disk Test
check_command check_nrpe_args!check_disk_test!20% 10% /boot
}
This is working perfectly for me, but of course you should change as required and choose what's best for you.
Hope this helps.
Kind Regards,
Gary Shergill
Re: monitoring linux disks
Posted: Tue Jan 15, 2013 10:47 am
by jeremy.garman
abrist: so I made all the config file changes that you suggested. I compiled everything exactly as per the default procedure from the main website, and I don't recall using --command-args. Does your question suggest that I ought to recompile and use that? If so, I'll do that and see what changes... : )
gshergill: OK, thanks for the procedure, much appreciated. If I can't get the process to work in a standard way, I'll follow your steps and report back.
Re: monitoring linux disks
Posted: Tue Jan 15, 2013 11:15 am
by slansing
Hello Jeremy,
I assume the arguments you are talking about are the addition of $ARG1$ etc on to commands? This allows you to push arguments to that check when calling it from the nagios server, as you will no doubt want to return warning and critical data, or other metrics the plugin allows!

Re: monitoring linux disks
Posted: Tue Jan 15, 2013 12:39 pm
by jeremy.garman
correct. the issues I've been experiencing are undefinted as of yet though. It might be my own stupid syntax, and it might be a miscompile of the client agent, or something altogether different.
I hadn't tried pushing all the remote command text via a single argument though ($ARG1$), I'd used three, each specifying the warning threshold, the critical threshold and the monitored partition.
Re: monitoring linux disks
Posted: Tue Jan 15, 2013 1:39 pm
by abrist
jeremy.garman wrote:correct. the issues I've been experiencing are undefinted as of yet though. It might be my own stupid syntax, and it might be a miscompile of the client agent, or something altogether different.
I hadn't tried pushing all the remote command text via a single argument though ($ARG1$), I'd used three, each specifying the warning threshold, the critical threshold and the monitored partition.
Rebuild NRPE as suggested by gshergill. You absolutely need command-args enabled to pass $ARGn$s to the client. Without it, you must hard code the nrpe.cfg commands for every client.