All Linux Server CPU Spike at same time
All Linux Server CPU Spike at same time
Nagios Core 4.1
everyday I get this on all my Linux servers
RaspberryPi Notifications for this host have been disabled Current Load CRITICAL 03-02-2017 20:46:26 0d 0h 5m 42s 4/4 CRITICAL - load average: 0.99, 7.56, 4.89
TGCS018 Notifications for this host have been disabled Current Load CRITICAL 03-02-2017 20:46:30 0d 0h 5m 38s 4/4 CRITICAL - load average: 0.91, 7.44, 4.86
localhost Notifications for this host have been disabled Current Load CRITICAL 03-02-2017 20:46:31 0d 0h 5m 37s 4/4 CRITICAL - load average: 0.91, 7.44, 4.86
vMA Notifications for this host have been disabled Current Load CRITICAL 03-02-2017 20:46:36 0d 0h 5m 32s 4/4 CRITICAL - load average: 0.83, 7.31, 4.84
The RapsberryPi is a Physical device and the other three are VM's
Is Nagios checks causing this?
Any ideas on Why
Thanks
Tom
everyday I get this on all my Linux servers
RaspberryPi Notifications for this host have been disabled Current Load CRITICAL 03-02-2017 20:46:26 0d 0h 5m 42s 4/4 CRITICAL - load average: 0.99, 7.56, 4.89
TGCS018 Notifications for this host have been disabled Current Load CRITICAL 03-02-2017 20:46:30 0d 0h 5m 38s 4/4 CRITICAL - load average: 0.91, 7.44, 4.86
localhost Notifications for this host have been disabled Current Load CRITICAL 03-02-2017 20:46:31 0d 0h 5m 37s 4/4 CRITICAL - load average: 0.91, 7.44, 4.86
vMA Notifications for this host have been disabled Current Load CRITICAL 03-02-2017 20:46:36 0d 0h 5m 32s 4/4 CRITICAL - load average: 0.83, 7.31, 4.84
The RapsberryPi is a Physical device and the other three are VM's
Is Nagios checks causing this?
Any ideas on Why
Thanks
Tom
Re: All Linux Server CPU Spike at same time
Doubtful - I would go investigate the hosts logs and your configurations setup. Judging by the similarity you may be checking the load on the localhost across the board.kwhogster wrote:os checks causing this?
Any ideas on Why
Former Nagios Employee
Re: All Linux Server CPU Spike at same time
Which logs you mean
This is my config check
They all are the same
On the local host a top extract
top - 23:02:08 up 53 days, 10:28, 2 users, load average: 0.04, 0.08, 0.23
Tasks: 194 total, 1 running, 193 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.5 us, 0.6 sy, 0.0 ni, 98.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 8072680 total, 410528 free, 5597416 used, 2064736 buff/cache
KiB Swap: 8286204 total, 8263772 free, 22432 used. 2046844 avail Mem
Would it be best to make the Nagios server a physical or a VM could that help?
Thanks
This is my config check
Code: Select all
define service{
use local-service ; Name of service template to use
host_name vMA
service_description Current Load
servicegroups CPULoad
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}
They all are the same
On the local host a top extract
top - 23:02:08 up 53 days, 10:28, 2 users, load average: 0.04, 0.08, 0.23
Tasks: 194 total, 1 running, 193 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.5 us, 0.6 sy, 0.0 ni, 98.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 8072680 total, 410528 free, 5597416 used, 2064736 buff/cache
KiB Swap: 8286204 total, 8263772 free, 22432 used. 2046844 avail Mem
Would it be best to make the Nagios server a physical or a VM could that help?
Thanks
Re: All Linux Server CPU Spike at same time
The log files you would have to check to see why the load went up at that time are on the remote linux systems and not the Nagios server.
Take a look at the /var/log folder for the log files on the remote hosts. The message file might have some clues to what is happening at that time.
Take a look at the /var/log folder for the log files on the remote hosts. The message file might have some clues to what is happening at that time.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: All Linux Server CPU Spike at same time
In addition to what @tgriep said, can you run tar -zcvf /tmp/supporttar.tar.gz /usr/local/nagios/etc and attach the file? If you are concerned about security, you can PM it to me. If you choose to PM, please make sure you update the thread so it shows back up on our support dashboard.
Re: All Linux Server CPU Spike at same time
I tried to send a PM but it I stuck in my outbox.
So I am attaching it
Note:
On the local host it went to critical again now
and when I did a top on the local host the numbers did not match they were far less than what Nagios was reporting.
Thoughts?
Also which log file from var/log ?
So I am attaching it
Note:
On the local host it went to critical again now
and when I did a top on the local host the numbers did not match they were far less than what Nagios was reporting.
Thoughts?
Also which log file from var/log ?
- Attachments
-
- supporttar.tar.gz
- tar file
- (53.35 KiB) Downloaded 303 times
Re: All Linux Server CPU Spike at same time
The command you are using (check_local_load) to check the remote servers is checking the local Nagios machine, as @rkennedy suggested. That means that for all 4 of the hosts, you are not checking their load but rather that of the Nagios server. That's why they all appear to go critical at the same time, and the values are so close.
You will need to use NRPE or something to check the remote machines.
Also, if a PM is stuck in the Outbox that just means the recipient has not yet read the message. Give it time and it should clear once they do.
You will need to use NRPE or something to check the remote machines.
Also, if a PM is stuck in the Outbox that just means the recipient has not yet read the message. Give it time and it should clear once they do.
Former Nagios employee
Re: All Linux Server CPU Spike at same time
Great
I am using nrpe this is from the nrpe.cfg on the Linux host
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
You have a check_nrpe sample or example
I am using nrpe this is from the nrpe.cfg on the Linux host
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
You have a check_nrpe sample or example
Re: All Linux Server CPU Spike at same time
With that command, this should be fine:
Since you're not passing arguments, all you really need to do is pass the -c with the command name.
Code: Select all
/usr/local/nagios/libexec/check_nrpe -H <host> -c check_load
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
Re: All Linux Server CPU Spike at same time
Ran this
root@tgcs017:/usr/local/nagios/etc/objects# /usr/local/nagios/libexec/check_nrpe -H 10.2.8.7 -c check_load
NRPE: Unable to read output
Thoughts
root@tgcs017:/usr/local/nagios/etc/objects# /usr/local/nagios/libexec/check_nrpe -H 10.2.8.7 -c check_load
NRPE: Unable to read output
Thoughts