All Linux Server CPU Spike at same time

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Re: All Linux Server CPU Spike at same time

Postby mcapra » Wed Mar 08, 2017 3:58 pm

From 10.2.8.7, can you share the output of these commands:

Code: Select all
ls -al /usr/local/nagios/libexec/
ps aux | grep xinetd
ps aux | grep nrpe
cat /usr/local/nagios/etc/nrpe.cfg
Be sure to check out our Knowledgebase for helpful articles and solutions!

https://github.com/mcapra/
User avatar
mcapra
Support Tech
 
Posts: 1960
Joined: Thu May 05, 2016 3:54 pm
Location: Nagios Enterprises

Re: All Linux Server CPU Spike at same time

Postby kwhogster » Wed Mar 08, 2017 9:35 pm

See attached file
Attachments
CPU Spike.txt
CPU Issue
(12.55 KiB) Downloaded 7 times
kwhogster
 
Posts: 375
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA

Re: All Linux Server CPU Spike at same time

Postby mcapra » Thu Mar 09, 2017 1:24 pm

I notice in your nrpe.cfg that the hosts are not comma-delimited:
Code: Select all
allowed_hosts=127.0.0.1 10.2.8.79


Not sure if that's causing these problems, but I would throw a comma in between those 2 IPs and restart the nrpe service.

From the previous remote machine (not your Nagios Core machine), can you run the following commands and share their outputs:
Code: Select all
su nagios
/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
ls -al /usr/lib/nagios/plugins/
/usr/lib/nagios/plugins/check_nrpe -H 127.0.0.1
/usr/lib/nagios/plugins/check_nrpe -H 127.0.0.1 -c check_load
Be sure to check out our Knowledgebase for helpful articles and solutions!

https://github.com/mcapra/
User avatar
mcapra
Support Tech
 
Posts: 1960
Joined: Thu May 05, 2016 3:54 pm
Location: Nagios Enterprises

Re: All Linux Server CPU Spike at same time

Postby kwhogster » Thu Mar 09, 2017 8:55 pm

Made this change on my Nagios Core Server

allowed_hosts=127.0.0.1 10.2.8.79 to this
allowed_hosts=127.0.0.1,10.2.8.79

Restarted the NRPE service

Now on one of the remote Linux hosts I found the plugins and here are the results see attached

The check_nrpe commands failed

Do I need to modify the nrpe.cfg on all the Linux boxes?
Attachments
CPU Spike 2.txt
Spike CPU 2
(5.58 KiB) Downloaded 4 times
kwhogster
 
Posts: 375
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA

Re: All Linux Server CPU Spike at same time

Postby mcapra » Fri Mar 10, 2017 12:00 pm

Does cutting SSL out of the equation fix things? Using the -n argument:

Code: Select all
/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -n
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -n -c check_load
Be sure to check out our Knowledgebase for helpful articles and solutions!

https://github.com/mcapra/
User avatar
mcapra
Support Tech
 
Posts: 1960
Joined: Thu May 05, 2016 3:54 pm
Location: Nagios Enterprises

Re: All Linux Server CPU Spike at same time

Postby kwhogster » Fri Mar 10, 2017 8:02 pm

[nagios@tgcs018 /]$ /usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
OK - load average: 0.00, 0.01, 0.05|load1=0.000;15.000;30.000;0; load5=0.010;10.000;25.000;0; load15=0.050;5.000;20.000;0;
[nagios@tgcs018 /]$ /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -n
CHECK_NRPE: Error receiving data from daemon.
[nagios@tgcs018 /]$ /usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -n -c check_load
CHECK_NRPE: Error receiving data from daemon.

Still doing this on remote Linux box
kwhogster
 
Posts: 375
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA

Re: All Linux Server CPU Spike at same time

Postby mcapra » Mon Mar 13, 2017 11:40 am

Can you run those commands again:
Code: Select all
/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -n
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1 -n -c check_load


Then shortly after, share the output of:
Code: Select all
tail -n 200 /var/log/messages | grep nrpe
Be sure to check out our Knowledgebase for helpful articles and solutions!

https://github.com/mcapra/
User avatar
mcapra
Support Tech
 
Posts: 1960
Joined: Thu May 05, 2016 3:54 pm
Location: Nagios Enterprises

Re: All Linux Server CPU Spike at same time

Postby kwhogster » Mon Mar 13, 2017 8:32 pm

Code: Select all
[root@tgcs018 /]# cd /usr/local/nagios/libexec
[root@tgcs018 libexec]# check_load -w 15,10,5 -c 30,25,20
-bash: check_load: command not found
[root@tgcs018 libexec]# ls
check_apt                    check_jabber         check_services
check_asterisk.pl            check_load           check_simap
check_asterisk_sip_peers.sh  check_log            check_sip
check_breeze                 check_mailq          check_smtp
check_by_ssh                 check_mrtg           check_spop
check_clamd                  check_mrtgtraf       check_ssh
check_cluster                check_nagios         check_ssmtp
check_cpu_stats.sh           check_netstat.pl     check_swap
check_dhcp                   check_nntp           check_tcp
check_dig                    check_nntps          check_time
check_disk                   check_nrpe           check_udp
check_disk_smb               check_nt             check_ups
check_dns                    check_ntp            check_uptime
check_dummy                  check_ntp_peer       check_users
check_file_age               check_ntp_time       check_wave
check_flexlm                 check_nwstat         check_yum
check_ftp                    check_open_files.pl  custom_check_mem
check_http                   check_oracle         custom_check_procs
check_icmp                   check_overcr         nagisk.pl
check_ide_smart              check_ping           negate
check_ifoperstatus           check_pop            send_nsca
check_ifstatus               check_procs          urlize
check_imap                   check_real           utils.pm
check_init_service           check_rpc            utils.sh
check_ircd                   check_sensors
[root@tgcs018 libexec]# ./check_load -w 15,10,5 -c 30,25,20
OK - load average: 0.05, 0.03, 0.05|load1=0.050;15.000;30.000;0; load5=0.030;10.000;25.000;0; load15=0.050;5.000;20.000;0;
[root@tgcs018 libexec]# ./check_nrpe -H 127.0.0.1 -n
CHECK_NRPE: Error receiving data from daemon.
[root@tgcs018 libexec]# ./check_nrpe -H 127.0.0.1 -n -c check_load
CHECK_NRPE: Error receiving data from daemon.
[root@tgcs018 libexec]# tail -n 200 /var/log/messages | grep nrpe
Mar 13 19:01:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=8273 duration=0(sec)
Mar 13 19:02:25 tgcs018 xinetd[883]: START: nrpe pid=8446 from=::ffff:10.2.8.79
Mar 13 19:02:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=8446 duration=0(sec)
Mar 13 19:03:25 tgcs018 xinetd[883]: START: nrpe pid=8619 from=::ffff:10.2.8.79
Mar 13 19:03:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=8619 duration=0(sec)
Mar 13 19:04:25 tgcs018 xinetd[883]: START: nrpe pid=8790 from=::ffff:10.2.8.79
Mar 13 19:04:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=8790 duration=0(sec)
Mar 13 19:05:25 tgcs018 xinetd[883]: START: nrpe pid=8961 from=::ffff:10.2.8.79
Mar 13 19:05:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=8961 duration=0(sec)
Mar 13 19:06:25 tgcs018 xinetd[883]: START: nrpe pid=9132 from=::ffff:10.2.8.79
Mar 13 19:06:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=9132 duration=0(sec)
Mar 13 19:07:25 tgcs018 xinetd[883]: START: nrpe pid=9303 from=::ffff:10.2.8.79
Mar 13 19:07:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=9303 duration=0(sec)
Mar 13 19:08:25 tgcs018 xinetd[883]: START: nrpe pid=9474 from=::ffff:10.2.8.79
Mar 13 19:08:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=9474 duration=0(sec)
Mar 13 19:09:25 tgcs018 xinetd[883]: START: nrpe pid=9657 from=::ffff:10.2.8.79
Mar 13 19:09:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=9657 duration=0(sec)
Mar 13 19:10:25 tgcs018 xinetd[883]: START: nrpe pid=9834 from=::ffff:10.2.8.79
Mar 13 19:10:26 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=9834 duration=1(sec)
Mar 13 19:11:25 tgcs018 xinetd[883]: START: nrpe pid=10005 from=::ffff:10.2.8.79
Mar 13 19:11:26 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=10005 duration=1(sec)
Mar 13 19:12:26 tgcs018 xinetd[883]: START: nrpe pid=10176 from=::ffff:10.2.8.79
Mar 13 19:12:26 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=10176 duration=0(sec)
Mar 13 19:13:26 tgcs018 xinetd[883]: START: nrpe pid=10347 from=::ffff:10.2.8.79
Mar 13 19:13:26 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=10347 duration=0(sec)
Mar 13 19:14:26 tgcs018 xinetd[883]: START: nrpe pid=10518 from=::ffff:10.2.8.79
Mar 13 19:14:26 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=10518 duration=0(sec)
Mar 13 19:15:25 tgcs018 xinetd[883]: START: nrpe pid=10689 from=::ffff:10.2.8.79
Mar 13 19:15:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=10689 duration=0(sec)
Mar 13 19:16:25 tgcs018 xinetd[883]: START: nrpe pid=10860 from=::ffff:10.2.8.79
Mar 13 19:16:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=10860 duration=0(sec)
Mar 13 19:17:25 tgcs018 xinetd[883]: START: nrpe pid=11031 from=::ffff:10.2.8.79
Mar 13 19:17:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=11031 duration=0(sec)
Mar 13 19:18:25 tgcs018 xinetd[883]: START: nrpe pid=11202 from=::ffff:10.2.8.79
Mar 13 19:18:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=11202 duration=0(sec)
Mar 13 19:19:25 tgcs018 xinetd[883]: START: nrpe pid=11373 from=::ffff:10.2.8.79
Mar 13 19:19:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=11373 duration=0(sec)
Mar 13 19:20:25 tgcs018 xinetd[883]: START: nrpe pid=11549 from=::ffff:10.2.8.79
Mar 13 19:20:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=11549 duration=0(sec)
Mar 13 19:21:25 tgcs018 xinetd[883]: START: nrpe pid=11720 from=::ffff:10.2.8.79
Mar 13 19:21:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=11720 duration=0(sec)
Mar 13 19:22:25 tgcs018 xinetd[883]: START: nrpe pid=11891 from=::ffff:10.2.8.79
Mar 13 19:22:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=11891 duration=0(sec)
Mar 13 19:23:25 tgcs018 xinetd[883]: START: nrpe pid=12068 from=::ffff:10.2.8.79
Mar 13 19:23:25 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=12068 duration=0(sec)
Mar 13 19:24:25 tgcs018 xinetd[883]: START: nrpe pid=12240 from=::ffff:10.2.8.79
Mar 13 19:24:26 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=12240 duration=1(sec)
Mar 13 19:25:26 tgcs018 xinetd[883]: START: nrpe pid=12411 from=::ffff:10.2.8.79
Mar 13 19:25:26 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=12411 duration=0(sec)
Mar 13 19:26:26 tgcs018 xinetd[883]: START: nrpe pid=12582 from=::ffff:10.2.8.79
Mar 13 19:26:26 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=12582 duration=0(sec)
Mar 13 19:27:26 tgcs018 xinetd[883]: START: nrpe pid=12753 from=::ffff:10.2.8.79
Mar 13 19:27:26 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=12753 duration=0(sec)
Mar 13 19:28:26 tgcs018 xinetd[883]: START: nrpe pid=12924 from=::ffff:10.2.8.79
Mar 13 19:28:26 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=12924 duration=0(sec)
Mar 13 19:29:26 tgcs018 xinetd[883]: START: nrpe pid=13112 from=::ffff:10.2.8.79
Mar 13 19:29:26 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=13112 duration=0(sec)
Mar 13 19:30:26 tgcs018 xinetd[883]: START: nrpe pid=13291 from=::ffff:10.2.8.79
Mar 13 19:30:26 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=13291 duration=0(sec)
Mar 13 19:31:26 tgcs018 xinetd[883]: START: nrpe pid=13465 from=::ffff:10.2.8.79
Mar 13 19:31:26 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=13465 duration=0(sec)
Mar 13 19:31:27 tgcs018 xinetd[883]: START: nrpe pid=13509 from=::ffff:127.0.0.1
Mar 13 19:31:27 tgcs018 xinetd[13509]: FAIL: nrpe address from=::ffff:127.0.0.1
Mar 13 19:31:27 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=13509 duration=0(sec)
Mar 13 19:31:44 tgcs018 xinetd[883]: START: nrpe pid=13541 from=::ffff:127.0.0.1
Mar 13 19:31:44 tgcs018 xinetd[13541]: FAIL: nrpe address from=::ffff:127.0.0.1
Mar 13 19:31:44 tgcs018 xinetd[883]: EXIT: nrpe status=0 pid=13541 duration=0(sec)
[root@tgcs018 libexec]#



does this help?
Last edited by tmcdonald on Tue Mar 14, 2017 1:43 pm, edited 1 time in total.
Reason: Please use [code][/code] tags around long output
kwhogster
 
Posts: 375
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA

Re: All Linux Server CPU Spike at same time

Postby tgriep » Tue Mar 14, 2017 4:01 pm

It looks like the NRPE agent is being run by xinetd and not in daemon mode so can you edit the following file
Code: Select all
/etc/xinetd.d/nrpe


Comment out this line like the example below
Code: Select all
#       only_from       = 127.0.0.1


Save the file and restart xinetd by running
Code: Select all
service xinetd restart


This will allow any server connect to the NRPE agent.

Then run these commands and post the output.
Code: Select all
/usr/local/nagios/libexec/check_nrpe -H 127.0.0.1
/usr/local/nagios/libexec/check_nrpe
/usr/local/nagios/bin/nrpe


Then try running this from the Nagios server to see if the changes worked. Replace xxx.xxx.xxx.xxx with the IP address of the remote stsrem.
Code: Select all
/usr/local/nagios/libexec/check_nrpe -H xxx.xxx.xxx.xxx -c check_load
Last edited by dwhitfield on Mon Mar 20, 2017 7:29 pm, edited 1 time in total.
Reason: ant to any
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
tgriep
Madmin
 
Posts: 4313
Joined: Thu Oct 30, 2014 9:02 am

Re: All Linux Server CPU Spike at same time

Postby ssax » Tue Mar 14, 2017 4:03 pm

EDIT - Try tgriep's solution first.

We can turn on NRPE debugging to collect more information.

On the remote machine (not the nagios server), edit the file:

Code: Select all
/usr/local/nagios/etc/nrpe.cfg


Change:

Code: Select all
debug=0


To:

Code: Select all
debug=1


Then restart xinetd:

Code: Select all
service xinetd restart


Now we need to add an option to the rsyslog server so it processes debug messages, edit this file:

Code: Select all
/etc/rsyslogd.conf


Find /var/log/messages, the line in the config file will look like:

Code: Select all
*.info;mail.none;authpriv.none;cron.none /var/log/messages


We need to add the following to the line:

Code: Select all
*.info;mail.none;authpriv.none;cron.none;daemon.debug /var/log/messages


Then restart rsyslog:

Code: Select all
service rsyslog restart


Now there should be more information logged in /var/log/messages.

From your nagios server execute this command:
- Change YOURREMOTEHOST to the IP or DNS name of your remote host

Code: Select all
/usr/local/nagios/libexec/check_nrpe -H YOURREMOTEHOST


Then from the remote machine, please run this command and send us the output:

Code: Select all
tail -n 100 /var/log/messages


Thank you
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
ssax
Dreams In Code
 
Posts: 2813
Joined: Wed Feb 11, 2015 12:54 pm

PreviousNext

Return to Nagios Core

Who is online

Users browsing this forum: No registered users and 10 guests