Page 1 of 2

Check_Load displays load even if monitored node is off

Posted: Thu Jan 12, 2017 8:33 am
by Insep
Hey Guys,

im trying to create a monitoring structure for a Beowulf Cluster.
Nagios + Pnp4Nagios is running on an Raspberry Pi with Raspbian light.

However, the Check_Load Plugin gives me differnt Values for every PC (total of 60, each has a different ip) even though they are not booted.

After a look at the special template graphes from Pnp4Nagios, it seems that all curves are lining up similar to the curve of the localhost (Raspberry).

So how can it be, that all values are a slightly different version of the localhosts CPU Load?
And why is it possible that i get results from computers, that are not running at all?
Is this a problem of the plugin or Nagios itself?

Greetings,

Daniel

Re: Check_Load displays load even if monitored node is off

Posted: Thu Jan 12, 2017 1:26 pm
by dwhitfield
What version of Core are you using? Was it compiled from source or installed from distro repos? What version of pnp4nagios, and what version of the check_load plugin?

Could you upload your nagios.cfg?

All of that will give us some idea of where the problem is. It's possible it's not really a problem in any one thing, but some sort of compatibility issue.

Re: Check_Load displays load even if monitored node is off

Posted: Fri Jan 13, 2017 8:38 am
by Insep
Im using Version 3.5.1, installed from a repository. Pnp4nagios Version is 0.6.25.

I dont know the version of the Check Load Plugin, how do I find that out? They were automatically installed with the repo.

Nagios.cfg:
http://pastebin.com/raw/d5Kbt65V

And here a Screenshot of the Problem :?

Image

The localhost is the Raspberry, and all curves look similar.

Thanks for your help, i need to get this fixed, its part of my thesis :)

Re: Check_Load displays load even if monitored node is off

Posted: Fri Jan 13, 2017 2:50 pm
by mcapra
Can you post your full service definitions you are using for the load monitoring on the offline machines (I assume are ei-rn-cluster-00*)?

How are you executing the checks on these remote (currently offline) machines? Are you using an agent (NRPE, NCPA) passive checks, SSH, etc?

Re: Check_Load displays load even if monitored node is off

Posted: Sat Jan 14, 2017 12:38 pm
by Insep
This are the Host and Service Definition for one machine:

Code: Select all

define host{
    use                     generic-host
    host_name               ei-rn-cluster001
    alias                   ei-rn-cluster001
    address                 10.10.133.121
    }
define service{
    use                     generic-service
    host_name               ei-rn-cluster001
    service_description     Check Load
    check_command           check_load!5.0!4.0!3.0!10.0!6.0!4.0
    }
Generic Host and Service are the standart templates:

Code: Select all

define service{
        name                            generic-service ; The 'name' of this service template
        active_checks_enabled           1       ; Active service checks are enabled
        passive_checks_enabled          1       ; Passive service checks are enabled/accepted
        parallelize_check               1       ; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service             1       ; We should obsess over this service (if necessary)
        check_freshness                 0       ; Default is to NOT check service 'freshness'
        notifications_enabled           1       ; Service notifications are enabled
        event_handler_enabled           1       ; Service event handler is enabled
        flap_detection_enabled          1       ; Flap detection is enabled
        failure_prediction_enabled      1       ; Failure prediction is enabled
        process_perf_data               1       ; Process performance data
        retain_status_information       1       ; Retain status information across program restarts
        retain_nonstatus_information    1       ; Retain non-status information across program restarts
                notification_interval           0               ; Only send notifications on status change by default.
                is_volatile                     0
                check_period                    24x7
                normal_check_interval           5
                retry_check_interval            1
                max_check_attempts              4
                notification_period             24x7
                notification_options            w,u,c,r
                contact_groups                  admins
        register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }


define host{
        name                            generic-host    ; The name of this host template
        notifications_enabled           1       ; Host notifications are enabled
        event_handler_enabled           1       ; Host event handler is enabled
        flap_detection_enabled          1       ; Flap detection is enabled
        failure_prediction_enabled      1       ; Failure prediction is enabled
        process_perf_data               1       ; Process performance data
        retain_status_information       1       ; Retain status information across program restarts
        retain_nonstatus_information    1       ; Retain non-status information across program restarts
                check_command                   check-host-alive
                max_check_attempts              10
                notification_interval           0
                notification_period             24x7
                notification_options            d,u,r
                contact_groups                  admins
        register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }
NRPE is installed on the monitored machines, but that should not be the Problem.

Re: Check_Load displays load even if monitored node is off

Posted: Mon Jan 16, 2017 12:49 pm
by rkennedy
Can you also post the command definition for check_load for us to review? We'll need to see how it all correlates.

Re: Check_Load displays load even if monitored node is off

Posted: Tue Jan 17, 2017 8:30 am
by Insep
You mean the commands.cfg?

Code: Select all

###############################################################################
# COMMANDS.CFG - SAMPLE COMMAND DEFINITIONS FOR NAGIOS
###############################################################################


################################################################################
# NOTIFICATION COMMANDS
################################################################################


# 'notify-host-by-email' command definition
define command{
        command_name    notify-host-by-email
        command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRE$
        }

# 'notify-service-by-email' command definition
define command{
        command_name    notify-service-by-email
        command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HO$
        }





################################################################################
# HOST CHECK COMMANDS
################################################################################

# On Debian, check-host-alive is being defined from within the
# nagios-plugins-basic package

################################################################################
# PERFORMANCE DATA COMMANDS
################################################################################


define command {
        command_name    process-service-perfdata
        command_line    /usr/bin/perl /usr/local/pnp4nagios/libexec/process_perfdata.pl
}

define command {
        command_name    process-host-perfdata
        command_line    /usr/bin/perl /usr/local/pnp4nagios/libexec/process_perfdata.pl -d HOSTPERFDATA
]
I cant find a specific command definition for check_load.
But if I set one like this, f.e. on the server cfg

Code: Select all

define command{
	command_name	check_load
	command_line	/usr/lib/nagios/plugins/check_load -H $HOSTADDRESS$
	}
Nagios tells me that:

Code: Select all

Warning: Duplicate definition found for command 'check_load' (config file '/etc/nagios3/usercfg/ClusterServer.cfg', starting on line 20)
Error: Could not add object property in file '/etc/nagios3/usercfg/ClusterServer.cfg' on line 21.
   Error processing object config files!
So where is the use of check_load defined? Maybe there is an error, but i cant find it :? :?
Btw the check_load Plugin Version is v2.1.1 :D

Edit:

Does check_load even has the option to check the load of other computers? The help says that is has no parameter for ip or anything in that direction :shock:

Code: Select all

pi@raspberrypi:/usr/lib/nagios/plugins $ ./check_load -h
check_load v2.1.1 (monitoring-plugins 2.1.1)
Copyright (c) 1999 Felipe Gustavo de Almeida <galmeida@linux.ime.usp.br>
Copyright (c) 1999-2007 Monitoring Plugins Development Team
        <devel@monitoring-plugins.org>

This plugin tests the current system load average.

Usage:
check_load [-r] -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15

Options:
 -h, --help
    Print detailed help screen
 -V, --version
    Print version information
 --extra-opts=[section][@file]
    Read options from an ini file. See
    https://www.monitoring-plugins.org/doc/extra-opts.html
    for usage and examples.
 -w, --warning=WLOAD1,WLOAD5,WLOAD15
    Exit with WARNING status if load average exceeds WLOADn
 -c, --critical=CLOAD1,CLOAD5,CLOAD15
    Exit with CRITICAL status if load average exceed CLOADn
    the load average format is the same used by "uptime" and "w"
 -r, --percpu
    Divide the load averages by the number of CPUs (when possible)

Re: Check_Load displays load even if monitored node is off

Posted: Tue Jan 17, 2017 11:26 am
by rkennedy
I would track down what's going on in /etc/nagios3/usercfg/ClusterServer.cfg - the command may be defined here which means it's not needed in commands.cfg. Nagios object definitions can be put wherever, so it could have been setup differently.
Does check_load even has the option to check the load of other computers? The help says that is has no parameter for ip or anything in that direction :shock:
It appears so, you'll want to look at either using check_by_ssh to execute the script, or an agent such as NRPE / NCPA. (you would then have check_load copiped locally on the machine you want to monitor, and have it run locally still. the agent would serve as the communication in between nagios <-> the client system)

NRPE - https://support.nagios.com/kb/category.php?id=22
NCPA - https://nagios.org/ncpa

check_by_ssh - guide is for XI, but same concept applies for Core - https://assets.nagios.com/downloads/nag ... ng_SSH.pdf

Re: Check_Load displays load even if monitored node is off

Posted: Tue Jan 17, 2017 12:17 pm
by Insep
Yeah, I definitely need an agent.

But I read about SNMP instead of NRPE, like this

https://exchange.nagios.org/directory/P ... MP/details

Will this do the same Job?

Re: Check_Load displays load even if monitored node is off

Posted: Tue Jan 17, 2017 1:52 pm
by rkennedy
Yes, you could go the SNMP route as long as the device is setup with SNMP enabled / supported.

For linux -
https://assets.nagios.com/downloads/nag ... g_SNMP.pdf