Check_Load displays load even if monitored node is off
Check_Load displays load even if monitored node is off
Hey Guys,
im trying to create a monitoring structure for a Beowulf Cluster.
Nagios + Pnp4Nagios is running on an Raspberry Pi with Raspbian light.
However, the Check_Load Plugin gives me differnt Values for every PC (total of 60, each has a different ip) even though they are not booted.
After a look at the special template graphes from Pnp4Nagios, it seems that all curves are lining up similar to the curve of the localhost (Raspberry).
So how can it be, that all values are a slightly different version of the localhosts CPU Load?
And why is it possible that i get results from computers, that are not running at all?
Is this a problem of the plugin or Nagios itself?
Greetings,
Daniel
im trying to create a monitoring structure for a Beowulf Cluster.
Nagios + Pnp4Nagios is running on an Raspberry Pi with Raspbian light.
However, the Check_Load Plugin gives me differnt Values for every PC (total of 60, each has a different ip) even though they are not booted.
After a look at the special template graphes from Pnp4Nagios, it seems that all curves are lining up similar to the curve of the localhost (Raspberry).
So how can it be, that all values are a slightly different version of the localhosts CPU Load?
And why is it possible that i get results from computers, that are not running at all?
Is this a problem of the plugin or Nagios itself?
Greetings,
Daniel
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Check_Load displays load even if monitored node is off
What version of Core are you using? Was it compiled from source or installed from distro repos? What version of pnp4nagios, and what version of the check_load plugin?
Could you upload your nagios.cfg?
All of that will give us some idea of where the problem is. It's possible it's not really a problem in any one thing, but some sort of compatibility issue.
Could you upload your nagios.cfg?
All of that will give us some idea of where the problem is. It's possible it's not really a problem in any one thing, but some sort of compatibility issue.
Re: Check_Load displays load even if monitored node is off
Im using Version 3.5.1, installed from a repository. Pnp4nagios Version is 0.6.25.
I dont know the version of the Check Load Plugin, how do I find that out? They were automatically installed with the repo.
Nagios.cfg:
http://pastebin.com/raw/d5Kbt65V
And here a Screenshot of the Problem
The localhost is the Raspberry, and all curves look similar.
Thanks for your help, i need to get this fixed, its part of my thesis
I dont know the version of the Check Load Plugin, how do I find that out? They were automatically installed with the repo.
Nagios.cfg:
http://pastebin.com/raw/d5Kbt65V
And here a Screenshot of the Problem
The localhost is the Raspberry, and all curves look similar.
Thanks for your help, i need to get this fixed, its part of my thesis
Re: Check_Load displays load even if monitored node is off
Can you post your full service definitions you are using for the load monitoring on the offline machines (I assume are ei-rn-cluster-00*)?
How are you executing the checks on these remote (currently offline) machines? Are you using an agent (NRPE, NCPA) passive checks, SSH, etc?
How are you executing the checks on these remote (currently offline) machines? Are you using an agent (NRPE, NCPA) passive checks, SSH, etc?
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
Re: Check_Load displays load even if monitored node is off
This are the Host and Service Definition for one machine:
Generic Host and Service are the standart templates:
NRPE is installed on the monitored machines, but that should not be the Problem.
Code: Select all
define host{
use generic-host
host_name ei-rn-cluster001
alias ei-rn-cluster001
address 10.10.133.121
}
define service{
use generic-service
host_name ei-rn-cluster001
service_description Check Load
check_command check_load!5.0!4.0!3.0!10.0!6.0!4.0
}
Code: Select all
define service{
name generic-service ; The 'name' of this service template
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
notification_interval 0 ; Only send notifications on status change by default.
is_volatile 0
check_period 24x7
normal_check_interval 5
retry_check_interval 1
max_check_attempts 4
notification_period 24x7
notification_options w,u,c,r
contact_groups admins
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}
define host{
name generic-host ; The name of this host template
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
check_command check-host-alive
max_check_attempts 10
notification_interval 0
notification_period 24x7
notification_options d,u,r
contact_groups admins
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
Re: Check_Load displays load even if monitored node is off
Can you also post the command definition for check_load for us to review? We'll need to see how it all correlates.
Former Nagios Employee
Re: Check_Load displays load even if monitored node is off
You mean the commands.cfg?
I cant find a specific command definition for check_load.
But if I set one like this, f.e. on the server cfg
Nagios tells me that:
So where is the use of check_load defined? Maybe there is an error, but i cant find it
Btw the check_load Plugin Version is v2.1.1
Edit:
Does check_load even has the option to check the load of other computers? The help says that is has no parameter for ip or anything in that direction
Code: Select all
###############################################################################
# COMMANDS.CFG - SAMPLE COMMAND DEFINITIONS FOR NAGIOS
###############################################################################
################################################################################
# NOTIFICATION COMMANDS
################################################################################
# 'notify-host-by-email' command definition
define command{
command_name notify-host-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRE$
}
# 'notify-service-by-email' command definition
define command{
command_name notify-service-by-email
command_line /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HO$
}
################################################################################
# HOST CHECK COMMANDS
################################################################################
# On Debian, check-host-alive is being defined from within the
# nagios-plugins-basic package
################################################################################
# PERFORMANCE DATA COMMANDS
################################################################################
define command {
command_name process-service-perfdata
command_line /usr/bin/perl /usr/local/pnp4nagios/libexec/process_perfdata.pl
}
define command {
command_name process-host-perfdata
command_line /usr/bin/perl /usr/local/pnp4nagios/libexec/process_perfdata.pl -d HOSTPERFDATA
]
But if I set one like this, f.e. on the server cfg
Code: Select all
define command{
command_name check_load
command_line /usr/lib/nagios/plugins/check_load -H $HOSTADDRESS$
}
Code: Select all
Warning: Duplicate definition found for command 'check_load' (config file '/etc/nagios3/usercfg/ClusterServer.cfg', starting on line 20)
Error: Could not add object property in file '/etc/nagios3/usercfg/ClusterServer.cfg' on line 21.
Error processing object config files!
Btw the check_load Plugin Version is v2.1.1
Edit:
Does check_load even has the option to check the load of other computers? The help says that is has no parameter for ip or anything in that direction
Code: Select all
pi@raspberrypi:/usr/lib/nagios/plugins $ ./check_load -h
check_load v2.1.1 (monitoring-plugins 2.1.1)
Copyright (c) 1999 Felipe Gustavo de Almeida <galmeida@linux.ime.usp.br>
Copyright (c) 1999-2007 Monitoring Plugins Development Team
<devel@monitoring-plugins.org>
This plugin tests the current system load average.
Usage:
check_load [-r] -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15
Options:
-h, --help
Print detailed help screen
-V, --version
Print version information
--extra-opts=[section][@file]
Read options from an ini file. See
https://www.monitoring-plugins.org/doc/extra-opts.html
for usage and examples.
-w, --warning=WLOAD1,WLOAD5,WLOAD15
Exit with WARNING status if load average exceeds WLOADn
-c, --critical=CLOAD1,CLOAD5,CLOAD15
Exit with CRITICAL status if load average exceed CLOADn
the load average format is the same used by "uptime" and "w"
-r, --percpu
Divide the load averages by the number of CPUs (when possible)
Re: Check_Load displays load even if monitored node is off
I would track down what's going on in /etc/nagios3/usercfg/ClusterServer.cfg - the command may be defined here which means it's not needed in commands.cfg. Nagios object definitions can be put wherever, so it could have been setup differently.
NRPE - https://support.nagios.com/kb/category.php?id=22
NCPA - https://nagios.org/ncpa
check_by_ssh - guide is for XI, but same concept applies for Core - https://assets.nagios.com/downloads/nag ... ng_SSH.pdf
It appears so, you'll want to look at either using check_by_ssh to execute the script, or an agent such as NRPE / NCPA. (you would then have check_load copiped locally on the machine you want to monitor, and have it run locally still. the agent would serve as the communication in between nagios <-> the client system)Does check_load even has the option to check the load of other computers? The help says that is has no parameter for ip or anything in that direction
NRPE - https://support.nagios.com/kb/category.php?id=22
NCPA - https://nagios.org/ncpa
check_by_ssh - guide is for XI, but same concept applies for Core - https://assets.nagios.com/downloads/nag ... ng_SSH.pdf
Former Nagios Employee
Re: Check_Load displays load even if monitored node is off
Yeah, I definitely need an agent.
But I read about SNMP instead of NRPE, like this
https://exchange.nagios.org/directory/P ... MP/details
Will this do the same Job?
But I read about SNMP instead of NRPE, like this
https://exchange.nagios.org/directory/P ... MP/details
Will this do the same Job?
Re: Check_Load displays load even if monitored node is off
Yes, you could go the SNMP route as long as the device is setup with SNMP enabled / supported.
For linux -
https://assets.nagios.com/downloads/nag ... g_SNMP.pdf
For linux -
https://assets.nagios.com/downloads/nag ... g_SNMP.pdf
Former Nagios Employee