All Linux Server CPU Spike at same time

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

All Linux Server CPU Spike at same time

Postby kwhogster » Thu Mar 02, 2017 8:55 pm

Nagios Core 4.1

everyday I get this on all my Linux servers


RaspberryPi Notifications for this host have been disabled Current Load CRITICAL 03-02-2017 20:46:26 0d 0h 5m 42s 4/4 CRITICAL - load average: 0.99, 7.56, 4.89
TGCS018 Notifications for this host have been disabled Current Load CRITICAL 03-02-2017 20:46:30 0d 0h 5m 38s 4/4 CRITICAL - load average: 0.91, 7.44, 4.86
localhost Notifications for this host have been disabled Current Load CRITICAL 03-02-2017 20:46:31 0d 0h 5m 37s 4/4 CRITICAL - load average: 0.91, 7.44, 4.86
vMA Notifications for this host have been disabled Current Load CRITICAL 03-02-2017 20:46:36 0d 0h 5m 32s 4/4 CRITICAL - load average: 0.83, 7.31, 4.84


The RapsberryPi is a Physical device and the other three are VM's

Is Nagios checks causing this?

Any ideas on Why


Thanks

Tom
kwhogster
 
Posts: 375
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA

Re: All Linux Server CPU Spike at same time

Postby rkennedy » Thu Mar 02, 2017 10:51 pm

kwhogster wrote:os checks causing this?

Any ideas on Why


Doubtful - I would go investigate the hosts logs and your configurations setup. Judging by the similarity you may be checking the load on the localhost across the board.
rkennedy
 
Posts: 6545
Joined: Mon Oct 05, 2015 11:45 am

Re: All Linux Server CPU Spike at same time

Postby kwhogster » Thu Mar 02, 2017 11:03 pm

Which logs you mean

This is my config check

Code: Select all
define service{
        use                             local-service         ; Name of service template to use
        host_name                       vMA
        service_description             Current Load
        servicegroups                   CPULoad
        check_command                   check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
        }


They all are the same

On the local host a top extract

top - 23:02:08 up 53 days, 10:28, 2 users, load average: 0.04, 0.08, 0.23
Tasks: 194 total, 1 running, 193 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.5 us, 0.6 sy, 0.0 ni, 98.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 8072680 total, 410528 free, 5597416 used, 2064736 buff/cache
KiB Swap: 8286204 total, 8263772 free, 22432 used. 2046844 avail Mem


Would it be best to make the Nagios server a physical or a VM could that help?

Thanks
kwhogster
 
Posts: 375
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA

Re: All Linux Server CPU Spike at same time

Postby tgriep » Fri Mar 03, 2017 10:52 am

The log files you would have to check to see why the load went up at that time are on the remote linux systems and not the Nagios server.
Take a look at the /var/log folder for the log files on the remote hosts. The message file might have some clues to what is happening at that time.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 4313
Joined: Thu Oct 30, 2014 9:02 am

Re: All Linux Server CPU Spike at same time

Postby dwhitfield » Fri Mar 03, 2017 10:55 am

In addition to what @tgriep said, can you run tar -zcvf /tmp/supporttar.tar.gz /usr/local/nagios/etc and attach the file? If you are concerned about security, you can PM it to me. If you choose to PM, please make sure you update the thread so it shows back up on our support dashboard.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
dwhitfield
The Doctor
 
Posts: 2091
Joined: Wed Sep 21, 2016 10:29 am
Location: Nagios Enterprises, LLC

Re: All Linux Server CPU Spike at same time

Postby kwhogster » Fri Mar 03, 2017 10:25 pm

I tried to send a PM but it I stuck in my outbox.

So I am attaching it

Note:

On the local host it went to critical again now

and when I did a top on the local host the numbers did not match they were far less than what Nagios was reporting.

Thoughts?

Also which log file from var/log ?
Attachments
supporttar.tar.gz
tar file
(53.35 KiB) Downloaded 3 times
kwhogster
 
Posts: 375
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA

Re: All Linux Server CPU Spike at same time

Postby tmcdonald » Mon Mar 06, 2017 1:01 pm

The command you are using (check_local_load) to check the remote servers is checking the local Nagios machine, as @rkennedy suggested. That means that for all 4 of the hosts, you are not checking their load but rather that of the Nagios server. That's why they all appear to go critical at the same time, and the values are so close.

You will need to use NRPE or something to check the remote machines.

Also, if a PM is stuck in the Outbox that just means the recipient has not yet read the message. Give it time and it should clear once they do.
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tmcdonald
Support Manager
 
Posts: 8177
Joined: Mon Sep 23, 2013 8:40 am

Re: All Linux Server CPU Spike at same time

Postby kwhogster » Mon Mar 06, 2017 7:10 pm

Great


I am using nrpe this is from the nrpe.cfg on the Linux host

command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20

You have a check_nrpe sample or example
kwhogster
 
Posts: 375
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA

Re: All Linux Server CPU Spike at same time

Postby mcapra » Tue Mar 07, 2017 12:42 pm

With that command, this should be fine:
Code: Select all
/usr/local/nagios/libexec/check_nrpe -H <host> -c check_load


Since you're not passing arguments, all you really need to do is pass the -c with the command name.
Be sure to check out our Knowledgebase for helpful articles and solutions!

https://github.com/mcapra/
User avatar
mcapra
Support Tech
 
Posts: 1960
Joined: Thu May 05, 2016 3:54 pm
Location: Nagios Enterprises

Re: All Linux Server CPU Spike at same time

Postby kwhogster » Tue Mar 07, 2017 8:35 pm

Ran this

root@tgcs017:/usr/local/nagios/etc/objects# /usr/local/nagios/libexec/check_nrpe -H 10.2.8.7 -c check_load
NRPE: Unable to read output


Thoughts
kwhogster
 
Posts: 375
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA

Next

Return to Nagios Core

Who is online

Users browsing this forum: No registered users and 20 guests