##check_load usage issues ##

ashok · Post by **ashok** » Mon Sep 22, 2014 5:39 am

Hi All,

This is regarding the usage of check_load for linux servers..

[root@nagxi libexec]# ./check_load --help
check_load v1991 (nagios-plugins 1.4.13)
Copyright (c) 1999 Felipe Gustavo de Almeida <galmeida@linux.ime.usp.br>
Copyright (c) 1999-2007 Nagios Plugin Development Team
        <nagiosplug-devel@lists.sourceforge.net>

This plugin tests the current system load average.

Usage:check_load [-r] -w WLOAD1,WLOAD5,WLOAD15 -c CLOAD1,CLOAD5,CLOAD15

Options:
 -h, --help
    Print detailed help screen
 -V, --version
    Print version information
 -w, --warning=WLOAD1,WLOAD5,WLOAD15
    Exit with WARNING status if load average exceeds WLOADn
 -c, --critical=CLOAD1,CLOAD5,CLOAD15
    Exit with CRITICAL status if load average exceed CLOADn
    the load average format is the same used by "uptime" and "w"
 -r, --percpu
    Divide the load averages by the number of CPUs (when possible)

when i give three different usages , it gives the same output

Code: Select all

[root@nagxi libexec]# ./check_load -w 5 -c 7
WARNING - load average: 5.90, 2.96, 2.36|load1=5.900;5.000;7.000;0; load5=2.960;5.000;7.000;0; load15=2.360;5.000;7.000;0;
[root@nagxi libexec]# ./check_load -w 15,10,5 -c 30,20,10
OK - load average: 5.51, 2.93, 2.35|load1=5.510;15.000;30.000;0; load5=2.930;10.000;20.000;0; load15=2.350;5.000;10.000;0;
[root@nagxi libexec]# ./check_load -w 15 -c 30
OK - load average: 5.51, 2.93, 2.35|load1=5.510;15.000;30.000;0; load5=2.930;15.000;30.000;0; load15=2.350;15.000;30.000;0;
[root@nagxi libexec]# w
 15:58:17 up 192 days,  3:21,  3 users,  load average: 5.15, 2.89, 2.34

All are giving the same outputs.

This server is having 8 CPUs.

The question is,

How is that are the above commands are giving the same output..

is it like giving 5min and 15 min avrg load thershold is optional..?

Now comes my actual question,

If have 8 cores,

should i give the arguments as

check_load -w 5 -c 7 for warning as 70% and critical as 90 % approximately..(or) are those like 50% and 70%...5 and 7..

are the above arguments is the calculation of number of cores with required % of thersholds.

few websites are saying that it is load and its not the number of cpu and %s.. for some processes/aplication servers if the cpu is above 250% is also OK / warning..

can someone please explain me how to use and understand the arguments for cpu_load in a single digits intead of 3 avrgs, as most of my configs are in the single digit formats..

and how to calcultate them if the a server is having 16 cores of cpu..

usage example for 1 server which is having 16 cores of processors

Code: Select all

define service {
        host_name                       host1
        service_description             Load Average
        use                             xiwizard_generic_service
        servicegroups                   CPU,UnixCPU
        check_command                   check_nrpe!check_load!-a '-w 7 -c 9'
        max_check_attempts              3
        check_interval                  15
        retry_interval                  1
        check_period                    24x7
        notification_interval           0
        notification_period             24x7
        contact_groups                  msatoc,unixserveradmin
        _xiwizard                       nrpe
        register                        1
        }

thanx

slansing · Post by **slansing** » Mon Sep 22, 2014 2:00 pm

Load is a fairly complex calculation, it does not only take into account your CPU usage. Yes, with more CPUs/Cores per CPU a higher load on that system will not have as much of an impact as it would on a system with One CPU, and 2 cores (as an example). You will want to get a better understanding of load to figure out what thresholds you want:

http://blog.scoutapp.com/articles/2009/ ... d-averages

That is not something we can really help you learn, as your environment will be different than one here, the thresholds are ultimately up to you and what you thing you want to be alerted for.

ashok · Post by **ashok** » Mon Sep 22, 2014 11:47 pm

Slansing,

Thanq very much for your clarification,

As per my understanding, if i have 16 cores od CPU, then i can take 12 as warning at 70% and 14.5 as critical some where at 90 %.... correct?

but I'm bothered with the usage of it...

we have around 104 linux/unix servers ...

for all the servers the CPU is calculated using this command

check_command check_nrpe!check_load!-a '-w 7 -c 9'

Are we missing anything becoz of this.... does this command means like 7 and 9 will be taken for 1,5,15 minutes .. or only 1 minute cpu is claculated as 7 & 9 thresholds?

All these are configured by another guy who left.. i came to this position newly...3 months back..

Post by **lmiltchev** » Tue Sep 23, 2014 1:21 pm

With check_load, you could use the "-r" flag.

-r, --percpu
Divide the load averages by the number of CPUs (when possible)

See the plugin's usage by running in the CLI (from within the plugins directory):

Code: Select all

./check_load -h

Perhaps, this will give more a bit more accurate output.

Nagios Support Forum

##check_load usage issues ##

##check_load usage issues ##

Re: ##check_load usage issues ##

Re: ##check_load usage issues ##

Re: ##check_load usage issues ##