Duration resets for all Services with Apply Config
Re: Duration resets for all Services with Apply Config
Here it is.
You do not have the required permissions to view the files attached to this post.
Re: Duration resets for all Services with Apply Config
Is the time set properly on your Nagios server?
How many nagios processes are running on your server?
How about your php time?
It might be worth running through the following procedure to see whether or not it helps out:
Code: Select all
dateCode: Select all
hwclockCode: Select all
ps aux | grep nagiosCode: Select all
grep timez /etc/php.iniCode: Select all
service nagios stopCode: Select all
killall -9 nagiosCode: Select all
rm /usr/local/nagios/var/retention.datCode: Select all
service nagios startRe: Duration resets for all Services with Apply Config
Time
Nagios Processes
php time
Clearing Retention.dat
Removing the retention.dat file didn't fix the problem.
Thanks for looking into this.
Code: Select all
[root@ltc099l ~]# date
Mon Jul 6 15:47:32 CDT 2015
[root@ltc099l ~]# hwclock
Mon 06 Jul 2015 03:47:42 PM CDT -0.596463 seconds
Code: Select all
# ps aux | grep nagios
postgres 1387 0.0 0.2 22000 7116 ? S 07:46 0:03 postgres: nagio sxi nagiosxi 127.0.0.1(35132) idle
nagios 3783 0.0 0.0 6932 512 ? Ss Jun05 0:00 /usr/local/nagi os/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 3792 0.0 0.0 33724 808 ? S Jun05 3:27 /usr/local/nagi os/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
postgres 4577 0.0 0.1 22000 4112 ? S 15:47 0:00 postgres: nagio sxi nagiosxi 127.0.0.1(51570) idle
nagios 4799 0.0 0.0 5980 1508 ? S 15:48 0:00 crond
nagios 4800 0.0 0.0 5980 1508 ? S 15:48 0:00 crond
nagios 4801 0.0 0.0 5980 1508 ? S 15:48 0:00 crond
nagios 4802 0.0 0.0 5980 1508 ? S 15:48 0:00 crond
nagios 4803 0.0 0.0 5980 1508 ? S 15:48 0:00 crond
nagios 4806 0.0 0.0 2516 892 ? Ss 15:48 0:00 /bin/sh -c /usr /bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feed proc.log 2>&1
nagios 4807 0.0 0.0 2516 892 ? Ss 15:48 0:00 /bin/sh -c /usr /bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/ perfdataproc.log 2>&1
nagios 4808 0.3 0.5 30480 15584 ? S 15:48 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php
nagios 4809 0.3 0.5 30728 15748 ? S 15:48 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php
nagios 4814 0.0 0.0 2516 888 ? Ss 15:48 0:00 /bin/sh -c /usr /bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmd subsys.log 2>&1
nagios 4815 0.3 0.4 30620 15224 ? S 15:48 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php
nagios 4816 0.0 0.0 2516 888 ? Ss 15:48 0:00 /bin/sh -c /usr /bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/even tman.log 2>&1
nagios 4817 0.4 0.6 36208 20848 ? S 15:48 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php
nagios 4818 0.0 0.0 2516 888 ? Ss 15:48 0:00 /bin/sh -c /usr /bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysst at.log 2>&1
nagios 4819 0.3 0.5 30684 16112 ? S 15:48 0:00 /usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php
postgres 4822 0.1 0.1 21928 4800 ? S 15:48 0:00 postgres: nagio sxi nagiosxi 127.0.0.1(51719) idle
postgres 4824 0.0 0.1 21928 4020 ? R 15:48 0:00 postgres: nagio sxi nagiosxi 127.0.0.1(51721) UPDATE
postgres 4826 0.0 0.1 21928 4360 ? S 15:48 0:00 postgres: nagio sxi nagiosxi 127.0.0.1(51723) idle
postgres 4830 0.0 0.1 21928 3988 ? S 15:48 0:00 postgres: nagio sxi nagiosxi 127.0.0.1(51724) idle
postgres 4843 0.0 0.1 21928 5060 ? S 15:48 0:00 postgres: nagio sxi nagiosxi 127.0.0.1(51727) idle
nagios 5599 0.0 0.0 4168 696 ? S 15:49 0:00 /usr/local/nagi os/libexec/check_nt -H 10.90.18.219 -s -p 12489 -v COUNTER -l \Paging File(_Tot al)\% Usage Paging File usage is %.2f %% -w 85 -c 95
nagios 5603 0.0 0.0 4168 696 ? S 15:49 0:00 /usr/local/nagi os/libexec/check_nt -H 10.90.18.23 -s -p 12489 -v COUNTER -l \Server\Errors Sys tem Login Errors since last reboot is %.f -w 2 -c 20
nagios 5607 0.0 0.0 4168 700 ? S 15:49 0:00 /usr/local/nagi os/libexec/check_nt -H 10.90.18.177 -s -p 12489 -v COUNTER -l \Paging File(_Tot al)\% Usage Paging File usage is %.2f %% -w 85 -c 95
root 5610 0.0 0.0 4020 700 pts/0 S+ 15:49 0:00 grep nagios
nagios 6709 0.6 0.1 13584 5640 ? Ss 07:52 3:16 /usr/local/nagi os/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios 6710 0.0 0.0 3080 720 ? S 07:52 0:04 /usr/local/nagi os/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 6711 0.0 0.0 3080 724 ? S 07:52 0:04 /usr/local/nagi os/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 6712 0.0 0.0 3080 728 ? S 07:52 0:04 /usr/local/nagi os/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 6713 0.0 0.0 3080 728 ? S 07:52 0:04 /usr/local/nagi os/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 6715 0.0 0.0 3080 720 ? S 07:52 0:04 /usr/local/nagi os/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 6716 0.0 0.0 3080 720 ? S 07:52 0:04 /usr/local/nagi os/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 6723 0.0 0.0 6932 1164 ? S 07:52 0:10 /usr/local/nagi os/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 6724 0.4 0.0 8048 2308 ? S 07:52 2:05 /usr/local/nagi os/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios 6765 0.0 0.1 12652 4156 ? S 07:52 0:00 /usr/local/nagi os/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
postgres 8225 0.0 0.2 22000 7464 ? S 04:35 0:07 postgres: nagio sxi nagiosxi 127.0.0.1(46946) idle
postgres 14149 0.0 0.2 22028 7640 ? S 05:22 0:06 postgres: nagio sxi nagiosxi 127.0.0.1(36036) idle
postgres 16808 0.0 0.2 22028 7324 ? S 07:25 0:04 postgres: nagio sxi nagiosxi 127.0.0.1(40160) idle
postgres 20450 0.0 0.1 22000 5140 ? S 15:27 0:00 postgres: nagio sxi nagiosxi 127.0.0.1(56549) idle
postgres 27390 0.0 0.1 22000 5024 ? S 15:35 0:00 postgres: nagio sxi nagiosxi 127.0.0.1(43590) idle
postgres 28943 0.0 0.1 22000 5124 ? S 15:37 0:00 postgres: nagio sxi nagiosxi 127.0.0.1(44471) idle
postgres 32611 0.0 0.2 22000 6900 ? S 07:44 0:03 postgres: nagio sxi nagiosxi 127.0.0.1(39366) idle
Code: Select all
date.timezone = America/ChicagoCode: Select all
[root@ltc099l ~]# service nagios stop
Stopping nagios:. done.
[root@ltc099l ~]# killall -9 nagios
nagios: no process killed
[root@ltc099l ~]# rm /usr/local/nagios/var/retention.dat
rm: remove regular file `/usr/local/nagios/var/retention.dat'? y
[root@ltc099l ~]# service nagios start
Starting nagios: done.
Thanks for looking into this.
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: Duration resets for all Services with Apply Config
SavaSC - I don't see in the thread where anyone has addressed this...
Is this only during an apply config or does it also happen when the monitoring engine is simply reloaded? I'm wondering if it's inherent to the core process being restarted or if it's happening as a side-effect of the apply-config process.
To test either restart the monitoring engine within the XI interface or from the command line:
Is this only during an apply config or does it also happen when the monitoring engine is simply reloaded? I'm wondering if it's inherent to the core process being restarted or if it's happening as a side-effect of the apply-config process.
To test either restart the monitoring engine within the XI interface or from the command line:
Code: Select all
[root@localhost ~]# service nagios restart
Running configuration check...
Stopping nagios:. done.
Starting nagios: done.Re: Duration resets for all Services with Apply Config
Ah, good catch. I didn't notice that before. It does reset every time the Nagios service is restarted not just on the apply config.
**EDIT**
Just saw this. It appears that some of the times reset and some didn't. All of them acted like they were resetting but some have several hours of down time.
I have attached a screenshot.
**EDIT**
Just saw this. It appears that some of the times reset and some didn't. All of them acted like they were resetting but some have several hours of down time.
I have attached a screenshot.
You do not have the required permissions to view the files attached to this post.
Re: Duration resets for all Services with Apply Config
Could you upload the following file?
/usr/local/nagios/var/status.dat
Run these commands and post the output.
/usr/local/nagios/var/status.dat
Run these commands and post the output.
Code: Select all
tail -100 /usr/local/nagios/var/perfdata.log
tail -100 /usr/local/nagios/var/npcd.logBe sure to check out our Knowledgebase for helpful articles and solutions!
Re: Duration resets for all Services with Apply Config
perfdata.log
npcd.log
I have attached a zip of the /usr/local/nagios/var/status.dat file.
Code: Select all
tail -100 /usr/local/nagios/var/perfdata.log
2015-04-08 18:22:00 [8928] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-08 18:22:00 [8932] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535098.perfdata.service-PID-8932 deleted
2015-04-08 18:22:00 [8932] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-08 18:22:00 [8936] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535113.perfdata.service-PID-8936 deleted
2015-04-08 18:22:00 [8936] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-08 18:22:05 [9000] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-08 18:22:05 [9000] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-08 18:22:05 [9000] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-08 18:22:05 [9006] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-08 18:22:05 [9006] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-08 18:22:05 [9006] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-08 18:22:05 [9000] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535127.perfdata.service-PID-9000 deleted
2015-04-08 18:22:05 [9000] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-08 18:22:06 [9006] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535143.perfdata.service-PID-9006 deleted
2015-04-08 18:22:06 [9006] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:42:41 [17663] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:42:41 [17663] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:42:41 [17663] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:42:41 [17663] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535173.perfdata.service-PID-17663 deleted
2015-04-09 03:42:41 [17663] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:42:41 [17659] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:42:41 [17659] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:42:41 [17659] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:42:41 [17659] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535158.perfdata.service-PID-17659 deleted
2015-04-09 03:42:41 [17659] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:42:41 [17667] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:42:41 [17667] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:42:41 [17667] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:42:41 [17667] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535188.perfdata.service-PID-17667 deleted
2015-04-09 03:42:41 [17667] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:42:46 [17671] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:42:46 [17671] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:42:46 [17671] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:42:46 [17671] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535202.perfdata.service-PID-17671 deleted
2015-04-09 03:42:46 [17671] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:42:51 [17729] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:42:51 [17729] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:42:51 [17729] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:42:51 [17729] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535263.perfdata.service-PID-17729 deleted
2015-04-09 03:42:51 [17729] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:43:20 [17852] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:43:20 [17852] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:43:20 [17852] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:43:20 [17852] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535338.perfdata.service-PID-17852 deleted
2015-04-09 03:43:20 [17852] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:43:20 [17855] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:43:20 [17855] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:43:20 [17855] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:43:20 [17855] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535308.perfdata.service-PID-17855 deleted
2015-04-09 03:43:20 [17855] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:43:25 [17866] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:43:25 [17866] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:43:25 [17866] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:43:25 [17866] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535368.perfdata.service-PID-17866 deleted
2015-04-09 03:43:25 [17866] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:43:30 [17914] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:43:30 [17914] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:43:30 [17914] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:43:30 [17914] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535412.perfdata.service-PID-17914 deleted
2015-04-09 03:43:30 [17914] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:43:30 [17913] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:43:30 [17913] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:43:30 [17913] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:43:30 [17913] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535383.perfdata.service-PID-17913 deleted
2015-04-09 03:43:30 [17913] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 04:51:28 [9527] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 04:51:28 [9527] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 04:51:28 [9527] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 04:51:28 [9527] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428573066.perfdata.service-PID-9527 deleted
2015-04-09 04:51:28 [9527] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 04:53:19 [11016] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 04:53:19 [11016] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 04:53:19 [11016] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 04:53:19 [11016] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428573186.perfdata.service-PID-11016 deleted
2015-04-09 04:53:19 [11016] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 04:53:19 [11014] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 04:53:19 [11014] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 04:53:19 [11014] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 04:53:19 [11014] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428573171.perfdata.service-PID-11014 deleted
2015-04-09 04:53:19 [11014] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 04:54:33 [12060] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 04:54:33 [12060] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 04:54:33 [12060] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 04:54:33 [12060] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428573261.perfdata.service-PID-12060 deleted
2015-04-09 04:54:33 [12060] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 04:54:33 [12058] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 04:54:33 [12058] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 04:54:33 [12058] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 04:54:33 [12058] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428573246.perfdata.service-PID-12058 deleted
2015-04-09 04:54:33 [12058] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 04:55:30 [12875] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 04:55:30 [12875] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 04:55:30 [12875] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 04:55:30 [12875] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428573321.perfdata.service-PID-12875 deleted
2015-04-09 04:55:30 [12875] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 04:55:30 [12873] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 04:55:30 [12873] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 04:55:30 [12873] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 04:55:30 [12873] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428573306.perfdata.service-PID-12873 deleted
2015-04-09 04:55:30 [12873] [0] *** process_perfdata.pl terminated on signal ALRM
Code: Select all
# tail -100 /usr/local/nagios/var/npcd.log
[04-09-2015 00:26:12] NPCD: WARN: MAX load reached: load 82.760000/10.000000 at i=18
[04-09-2015 00:26:30] NPCD: WARN: MAX load reached: load 80.370000/10.000000 at i=18
[04-09-2015 00:26:48] NPCD: WARN: MAX load reached: load 76.550000/10.000000 at i=18
[04-09-2015 00:27:22] NPCD: WARN: MAX load reached: load 75.610000/10.000000 at i=18
[04-09-2015 00:28:08] NPCD: WARN: MAX load reached: load 74.580000/10.000000 at i=18
[04-09-2015 00:28:26] NPCD: WARN: MAX load reached: load 76.980000/10.000000 at i=18
[04-09-2015 00:28:45] NPCD: WARN: MAX load reached: load 77.860000/10.000000 at i=18
[04-09-2015 00:29:03] NPCD: WARN: MAX load reached: load 78.730000/10.000000 at i=18
[04-09-2015 00:29:21] NPCD: WARN: MAX load reached: load 79.820000/10.000000 at i=18
[04-09-2015 00:29:43] NPCD: WARN: MAX load reached: load 81.330000/10.000000 at i=18
[04-09-2015 00:30:04] NPCD: WARN: MAX load reached: load 81.150000/10.000000 at i=18
[04-09-2015 00:30:28] NPCD: WARN: MAX load reached: load 81.100000/10.000000 at i=18
[04-09-2015 00:30:56] NPCD: WARN: MAX load reached: load 81.020000/10.000000 at i=18
[04-09-2015 00:31:15] NPCD: WARN: MAX load reached: load 81.160000/10.000000 at i=18
[04-09-2015 00:31:34] NPCD: WARN: MAX load reached: load 83.440000/10.000000 at i=18
[04-09-2015 00:33:56] NPCD: WARN: MAX load reached: load 86.460000/10.000000 at i=18
[04-09-2015 00:44:33] NPCD: WARN: MAX load reached: load 109.080000/10.000000 at i=18
[04-09-2015 00:45:43] NPCD: WARN: MAX load reached: load 110.430000/10.000000 at i=18
[04-09-2015 00:48:48] NPCD: WARN: MAX load reached: load 102.180000/10.000000 at i=18
[04-09-2015 00:49:06] NPCD: WARN: MAX load reached: load 103.850000/10.000000 at i=18
[04-09-2015 00:49:25] NPCD: WARN: MAX load reached: load 103.370000/10.000000 at i=18
[04-09-2015 00:50:07] NPCD: WARN: MAX load reached: load 102.760000/10.000000 at i=18
[04-09-2015 00:51:06] NPCD: WARN: MAX load reached: load 102.630000/10.000000 at i=18
[04-09-2015 01:00:05] NPCD: WARN: MAX load reached: load 110.110000/10.000000 at i=18
[04-09-2015 01:08:03] NPCD: WARN: MAX load reached: load 114.000000/10.000000 at i=18
[04-09-2015 01:19:18] NPCD: WARN: MAX load reached: load 123.320000/10.000000 at i=18
[04-09-2015 01:19:34] NPCD: WARN: MAX load reached: load 113.910000/10.000000 at i=18
[04-09-2015 01:19:57] NPCD: WARN: MAX load reached: load 107.670000/10.000000 at i=18
[04-09-2015 01:26:52] NPCD: WARN: MAX load reached: load 113.530000/10.000000 at i=18
[04-09-2015 01:29:47] NPCD: WARN: MAX load reached: load 124.360000/10.000000 at i=18
[04-09-2015 01:30:08] NPCD: WARN: MAX load reached: load 126.420000/10.000000 at i=18
[04-09-2015 01:30:28] NPCD: WARN: MAX load reached: load 126.660000/10.000000 at i=18
[04-09-2015 01:30:55] NPCD: WARN: MAX load reached: load 126.390000/10.000000 at i=18
[04-09-2015 01:35:00] NPCD: WARN: MAX load reached: load 126.600000/10.000000 at i=18
[04-09-2015 01:58:11] NPCD: WARN: MAX load reached: load 129.530000/10.000000 at i=18
[04-09-2015 02:00:43] NPCD: WARN: MAX load reached: load 134.710000/10.000000 at i=18
[04-09-2015 02:01:33] NPCD: WARN: MAX load reached: load 135.770000/10.000000 at i=18
[04-09-2015 02:01:55] NPCD: WARN: MAX load reached: load 135.040000/10.000000 at i=18
[04-09-2015 02:07:06] NPCD: WARN: MAX load reached: load 136.500000/10.000000 at i=18
[04-09-2015 02:37:38] NPCD: WARN: MAX load reached: load 145.340000/10.000000 at i=18
[04-09-2015 02:38:27] NPCD: WARN: MAX load reached: load 150.410000/10.000000 at i=18
[04-09-2015 02:39:21] NPCD: WARN: MAX load reached: load 150.390000/10.000000 at i=18
[04-09-2015 02:52:29] NPCD: WARN: MAX load reached: load 154.300000/10.000000 at i=18
[04-09-2015 02:55:08] NPCD: WARN: MAX load reached: load 158.460000/10.000000 at i=18
[04-09-2015 03:17:53] NPCD: WARN: MAX load reached: load 160.820000/10.000000 at i=18
[04-09-2015 03:24:52] NPCD: WARN: MAX load reached: load 165.890000/10.000000 at i=18
[04-09-2015 03:35:49] NPCD: WARN: MAX load reached: load 161.160000/10.000000 at i=18
[04-09-2015 03:38:48] NPCD: WARN: MAX load reached: load 156.440000/10.000000 at i=18
[04-09-2015 03:39:06] NPCD: WARN: MAX load reached: load 154.360000/10.000000 at i=18
[04-09-2015 03:39:21] NPCD: WARN: MAX load reached: load 142.470000/10.000000 at i=18
[04-09-2015 03:39:36] NPCD: WARN: MAX load reached: load 119.970000/10.000000 at i=18
[04-09-2015 03:39:51] NPCD: WARN: MAX load reached: load 95.860000/10.000000 at i=18
[04-09-2015 03:40:06] NPCD: WARN: MAX load reached: load 75.580000/10.000000 at i=18
[04-09-2015 03:40:21] NPCD: WARN: MAX load reached: load 59.840000/10.000000 at i=18
[04-09-2015 03:40:36] NPCD: WARN: MAX load reached: load 47.250000/10.000000 at i=18
[04-09-2015 03:40:51] NPCD: WARN: MAX load reached: load 37.700000/10.000000 at i=18
[04-09-2015 03:41:06] NPCD: WARN: MAX load reached: load 30.020000/10.000000 at i=18
[04-09-2015 03:41:21] NPCD: WARN: MAX load reached: load 24.500000/10.000000 at i=18
[04-09-2015 03:41:36] NPCD: WARN: MAX load reached: load 19.740000/10.000000 at i=18
[04-09-2015 03:41:51] NPCD: WARN: MAX load reached: load 16.370000/10.000000 at i=18
[04-09-2015 03:42:06] NPCD: WARN: MAX load reached: load 13.410000/10.000000 at i=18
[04-09-2015 03:42:21] NPCD: WARN: MAX load reached: load 11.640000/10.000000 at i=18
[04-09-2015 03:42:41] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:42:41] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535173.perfdata.service'
[04-09-2015 03:42:41] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:42:41] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535158.perfdata.service'
[04-09-2015 03:42:41] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:42:41] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535188.perfdata.service'
[04-09-2015 03:42:46] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:42:46] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535202.perfdata.service'
[04-09-2015 03:42:51] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:42:51] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535263.perfdata.service'
[04-09-2015 03:43:20] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:43:20] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535338.perfdata.service'
[04-09-2015 03:43:20] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:43:20] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535308.perfdata.service'
[04-09-2015 03:43:25] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:43:25] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535368.perfdata.service'
[04-09-2015 03:43:30] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:43:30] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535412.perfdata.service'
[04-09-2015 03:43:30] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:43:30] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535383.perfdata.service'
[04-09-2015 04:51:28] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 04:51:28] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428573066.perfdata.service'
[04-09-2015 04:53:19] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 04:53:19] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428573186.perfdata.service'
[04-09-2015 04:53:19] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 04:53:19] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428573171.perfdata.service'
[04-09-2015 04:54:33] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 04:54:33] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428573261.perfdata.service'
[04-09-2015 04:54:33] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 04:54:33] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428573246.perfdata.service'
[04-09-2015 04:55:30] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 04:55:30] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428573321.perfdata.service'
[04-09-2015 04:55:30] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 04:55:30] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428573306.perfdata.service'
[06-05-2015 08:31:41] NPCD: Caught Termination Signal - Hasta la vista... baby
[06-05-2015 08:34:02] NPCD: npcd Daemon (0.4.14) started with PID=3792
[06-05-2015 08:34:02] NPCD: Please have a look at 'npcd -V' to get license information
[06-05-2015 08:34:02] NPCD: HINT: load_threshold is enabled - ('10.000000')
You do not have the required permissions to view the files attached to this post.
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: Duration resets for all Services with Apply Config
Isn't that special.SavaSC wrote:Code: Select all
NPCD: WARN: MAX load reached: load 165.890000/10.000000
So it would seem that there are some shortcomings in the resources being made available to your XI server. Does it have a good set of disks to run on? Enough CPU cores available? Have you made any changes recently that would affect the load dramatically? How many hosts/services are you monitoring?
You can partially work around this by adjusting load_threshold in /usr/local/nagios/etc/pnp/npcd.cfg - my guess though is that maybe it's time for you to do a ramdisk.
https://assets.nagios.com/downloads/nag ... giosXI.pdf
Re: Duration resets for all Services with Apply Config
We currently have the following allocated to the NagiosXI server:
- 4 vCPUs
- 4096MB RAM
- 70GB total HD space
According to vSphere, during these service restarts the following resources were used:
- 21% Max CPU usage
- 28% Max memory usage
I have changed the load_theshold = 170.0 and then restarted the Nagios service. The following resources were used in the time during and after the restart:
- 33% Max memory usage
- 35% Max CPU usage
Also, I notice that the errors you reference are from 3 months ago. The last date referenced in the npcd.log is a month ago. The last date referenced in the perfdata.log is 3 months ago. Have these logs gotten too big perhaps? (npcd.log = 2GB perfdata.log = 8GB) Should I clear these out?
Here is the disk usage report:
- 4 vCPUs
- 4096MB RAM
- 70GB total HD space
According to vSphere, during these service restarts the following resources were used:
- 21% Max CPU usage
- 28% Max memory usage
I have changed the load_theshold = 170.0 and then restarted the Nagios service. The following resources were used in the time during and after the restart:
- 33% Max memory usage
- 35% Max CPU usage
Also, I notice that the errors you reference are from 3 months ago. The last date referenced in the npcd.log is a month ago. The last date referenced in the perfdata.log is 3 months ago. Have these logs gotten too big perhaps? (npcd.log = 2GB perfdata.log = 8GB) Should I clear these out?
Here is the disk usage report:
Code: Select all
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
6.7G 3.2G 3.3G 50% /
/dev/sda1 99M 12M 82M 13% /boot
/dev/mapper/nagios_vg00-var_lib_mysql_lv
2.0G 398M 1.5G 21% /var/lib/mysql
/dev/mapper/nagios_vg00-var_lib_pgsql_lv
2.0G 268M 1.7G 14% /var/lib/pgsql
/dev/mapper/nagios_vg00-usr_local_lv
4.0G 3.2G 617M 84% /usr/local
/dev/mapper/nagios_vg00-db_backups_lv
6.0G 3.5G 2.2G 62% /store/backups
tmpfs 1.5G 0 1.5G 0% /dev/shm
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: Duration resets for all Services with Apply Config
Ha - sorry about that. I didn't bother looking at the date.SavaSC wrote:Also, I notice that the errors you reference are from 3 months ago. The last date referenced in the npcd.log is a month ago. The last date referenced in the perfdata.log is 3 months ago.
No - not related. We're barking up the wrong tree I'm afraid. Although - I do wonder what in perfdata.log is consuming 8GB of disk space. You can clear that out safely.SavaSC wrote:Have these logs gotten too big perhaps? (npcd.log = 2GB perfdata.log = 8GB) Should I clear these out?
Back to the drawing board, what shows up in /usr/local/nagios/var/nagios.log at the time of a monitoring engine reload? Anything interesting?