Check_Load displays load even if monitored node is off

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Insep
Posts: 9
Joined: Thu Jan 12, 2017 8:18 am

Check_Load displays load even if monitored node is off

Post by Insep »

Hey Guys,

im trying to create a monitoring structure for a Beowulf Cluster.
Nagios + Pnp4Nagios is running on an Raspberry Pi with Raspbian light.

However, the Check_Load Plugin gives me differnt Values for every PC (total of 60, each has a different ip) even though they are not booted.

After a look at the special template graphes from Pnp4Nagios, it seems that all curves are lining up similar to the curve of the localhost (Raspberry).

So how can it be, that all values are a slightly different version of the localhosts CPU Load?
And why is it possible that i get results from computers, that are not running at all?
Is this a problem of the plugin or Nagios itself?

Greetings,

Daniel
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Check_Load displays load even if monitored node is off

Post by dwhitfield »

What version of Core are you using? Was it compiled from source or installed from distro repos? What version of pnp4nagios, and what version of the check_load plugin?

Could you upload your nagios.cfg?

All of that will give us some idea of where the problem is. It's possible it's not really a problem in any one thing, but some sort of compatibility issue.
Insep
Posts: 9
Joined: Thu Jan 12, 2017 8:18 am

Re: Check_Load displays load even if monitored node is off

Post by Insep »

Im using Version 3.5.1, installed from a repository. Pnp4nagios Version is 0.6.25.

I dont know the version of the Check Load Plugin, how do I find that out? They were automatically installed with the repo.

Nagios.cfg:
http://pastebin.com/raw/d5Kbt65V

And here a Screenshot of the Problem :?

Image

The localhost is the Raspberry, and all curves look similar.

Thanks for your help, i need to get this fixed, its part of my thesis :)
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Check_Load displays load even if monitored node is off

Post by mcapra »

Can you post your full service definitions you are using for the load monitoring on the offline machines (I assume are ei-rn-cluster-00*)?

How are you executing the checks on these remote (currently offline) machines? Are you using an agent (NRPE, NCPA) passive checks, SSH, etc?
Former Nagios employee
https://www.mcapra.com/
Insep
Posts: 9
Joined: Thu Jan 12, 2017 8:18 am

Re: Check_Load displays load even if monitored node is off

Post by Insep »

This are the Host and Service Definition for one machine:

Code: Select all

define host{
    use                     generic-host
    host_name               ei-rn-cluster001
    alias                   ei-rn-cluster001
    address                 10.10.133.121
    }
define service{
    use                     generic-service
    host_name               ei-rn-cluster001
    service_description     Check Load
    check_command           check_load!5.0!4.0!3.0!10.0!6.0!4.0
    }
Generic Host and Service are the standart templates:

Code: Select all

define service{
        name                            generic-service ; The 'name' of this service template
        active_checks_enabled           1       ; Active service checks are enabled
        passive_checks_enabled          1       ; Passive service checks are enabled/accepted
        parallelize_check               1       ; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service             1       ; We should obsess over this service (if necessary)
        check_freshness                 0       ; Default is to NOT check service 'freshness'
        notifications_enabled           1       ; Service notifications are enabled
        event_handler_enabled           1       ; Service event handler is enabled
        flap_detection_enabled          1       ; Flap detection is enabled
        failure_prediction_enabled      1       ; Failure prediction is enabled
        process_perf_data               1       ; Process performance data
        retain_status_information       1       ; Retain status information across program restarts
        retain_nonstatus_information    1       ; Retain non-status information across program restarts
                notification_interval           0               ; Only send notifications on status change by default.
                is_volatile                     0
                check_period                    24x7
                normal_check_interval           5
                retry_check_interval            1
                max_check_attempts              4
                notification_period             24x7
                notification_options            w,u,c,r
                contact_groups                  admins
        register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }


define host{
        name                            generic-host    ; The name of this host template
        notifications_enabled           1       ; Host notifications are enabled
        event_handler_enabled           1       ; Host event handler is enabled
        flap_detection_enabled          1       ; Flap detection is enabled
        failure_prediction_enabled      1       ; Failure prediction is enabled
        process_perf_data               1       ; Process performance data
        retain_status_information       1       ; Retain status information across program restarts
        retain_nonstatus_information    1       ; Retain non-status information across program restarts
                check_command                   check-host-alive
                max_check_attempts              10
                notification_interval           0
                notification_period             24x7
                notification_options            d,u,r
                contact_groups                  admins
        register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
        }
NRPE is installed on the monitored machines, but that should not be the Problem.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Check_Load displays load even if monitored node is off

Post by rkennedy »

Can you also post the command definition for check_load for us to review? We'll need to see how it all correlates.
Former Nagios Employee
Insep
Posts: 9
Joined: Thu Jan 12, 2017 8:18 am

Re: Check_Load displays load even if monitored node is off

Post by Insep »

You mean the commands.cfg?

Code: Select all

###############################################################################
# COMMANDS.CFG - SAMPLE COMMAND DEFINITIONS FOR NAGIOS
###############################################################################


################################################################################
# NOTIFICATION COMMANDS
################################################################################


# 'notify-host-by-email' command definition
define command{
        command_name    notify-host-by-email
        command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRE$
        }

# 'notify-service-by-email' command definition
define command{
        command_name    notify-service-by-email
        command_line    /usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HO$
        }





################################################################################
# HOST CHECK COMMANDS
################################################################################

# On Debian, check-host-alive is being defined from within the
# nagios-plugins-basic package

################################################################################
# PERFORMANCE DATA COMMANDS
################################################################################


define command {
        command_name    process-service-perfdata
        command_line    /usr/bin/perl /usr/local/pnp4nagios/libexec/process_perfdata.pl
}

define command {
        command_name    process-host-perfdata
        command_line    /usr/bin/perl /usr/local/pnp4nagios/libexec/process_perfdata.pl -d HOSTPERFDATA
]
I cant find a specific command definition for check_load.
But if I set one like this, f.e. on the server cfg

Code: Select all

define command{
	command_name	check_load
	command_line	/usr/lib/nagios/plugins/check_load -H $HOSTADDRESS$
	}
Nagios tells me that:

Code: Select all

Warning: Duplicate definition found for command 'check_load' (config file '/etc/nagios3/usercfg/ClusterServer.cfg', starting on line 20)
Error: Could not add object property in file '/etc/nagios3/usercfg/ClusterServer.cfg' on line 21.
   Error processing object config files!
So where is the use of check_load defined? Maybe there is an error, but i cant find it :? :?
Btw the check_load Plugin Version is v2.1.1 :D

Edit:

Does check_load even has the option to check the load of other computers? The help says that is has no parameter for ip or anything in that direction :shock:

Code: Select all

pi@raspberrypi:/usr/lib/nagios/plugins $ ./check_load -h
check_load v2.1.1 (monitoring-plugins 2.1.1)
Copyright (c) 1999 Felipe Gustavo de Almeida <galmeida@linux.ime.usp.br>
Copyright (c) 1999-2007 Monitoring Plugins Development Team
        <devel@monitoring-plugins.org>

This plugin tests the current system load average.

Usage:
check_load [-r] -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15

Options:
 -h, --help
    Print detailed help screen
 -V, --version
    Print version information
 --extra-opts=[section][@file]
    Read options from an ini file. See
    https://www.monitoring-plugins.org/doc/extra-opts.html
    for usage and examples.
 -w, --warning=WLOAD1,WLOAD5,WLOAD15
    Exit with WARNING status if load average exceeds WLOADn
 -c, --critical=CLOAD1,CLOAD5,CLOAD15
    Exit with CRITICAL status if load average exceed CLOADn
    the load average format is the same used by "uptime" and "w"
 -r, --percpu
    Divide the load averages by the number of CPUs (when possible)
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Check_Load displays load even if monitored node is off

Post by rkennedy »

I would track down what's going on in /etc/nagios3/usercfg/ClusterServer.cfg - the command may be defined here which means it's not needed in commands.cfg. Nagios object definitions can be put wherever, so it could have been setup differently.
Does check_load even has the option to check the load of other computers? The help says that is has no parameter for ip or anything in that direction :shock:
It appears so, you'll want to look at either using check_by_ssh to execute the script, or an agent such as NRPE / NCPA. (you would then have check_load copiped locally on the machine you want to monitor, and have it run locally still. the agent would serve as the communication in between nagios <-> the client system)

NRPE - https://support.nagios.com/kb/category.php?id=22
NCPA - https://nagios.org/ncpa

check_by_ssh - guide is for XI, but same concept applies for Core - https://assets.nagios.com/downloads/nag ... ng_SSH.pdf
Former Nagios Employee
Insep
Posts: 9
Joined: Thu Jan 12, 2017 8:18 am

Re: Check_Load displays load even if monitored node is off

Post by Insep »

Yeah, I definitely need an agent.

But I read about SNMP instead of NRPE, like this

https://exchange.nagios.org/directory/P ... MP/details

Will this do the same Job?
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Check_Load displays load even if monitored node is off

Post by rkennedy »

Yes, you could go the SNMP route as long as the device is setup with SNMP enabled / supported.

For linux -
https://assets.nagios.com/downloads/nag ... g_SNMP.pdf
Former Nagios Employee
Locked