Page 2 of 4

Re: Duration resets for all Services with Apply Config

Posted: Mon Jul 06, 2015 1:48 pm
by SavaSC
Here it is.

Re: Duration resets for all Services with Apply Config

Posted: Mon Jul 06, 2015 3:36 pm
by jolson
Is the time set properly on your Nagios server?

Code: Select all

date

Code: Select all

hwclock
How many nagios processes are running on your server?

Code: Select all

ps aux | grep nagios
How about your php time?

Code: Select all

grep timez /etc/php.ini
It might be worth running through the following procedure to see whether or not it helps out:

Code: Select all

service nagios stop

Code: Select all

killall -9 nagios

Code: Select all

rm /usr/local/nagios/var/retention.dat

Code: Select all

service nagios start

Re: Duration resets for all Services with Apply Config

Posted: Mon Jul 06, 2015 3:56 pm
by SavaSC
Time

Code: Select all

[root@ltc099l ~]# date
Mon Jul  6 15:47:32 CDT 2015
[root@ltc099l ~]# hwclock
Mon 06 Jul 2015 03:47:42 PM CDT  -0.596463 seconds
Nagios Processes

Code: Select all

# ps aux | grep nagios
postgres  1387  0.0  0.2  22000  7116 ?        S    07:46   0:03 postgres: nagio                             sxi nagiosxi 127.0.0.1(35132) idle
nagios    3783  0.0  0.0   6932   512 ?        Ss   Jun05   0:00 /usr/local/nagi                             os/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios    3792  0.0  0.0  33724   808 ?        S    Jun05   3:27 /usr/local/nagi                             os/bin/npcd -d -f /usr/local/nagios/etc/pnp/npcd.cfg
postgres  4577  0.0  0.1  22000  4112 ?        S    15:47   0:00 postgres: nagio                             sxi nagiosxi 127.0.0.1(51570) idle
nagios    4799  0.0  0.0   5980  1508 ?        S    15:48   0:00 crond
nagios    4800  0.0  0.0   5980  1508 ?        S    15:48   0:00 crond
nagios    4801  0.0  0.0   5980  1508 ?        S    15:48   0:00 crond
nagios    4802  0.0  0.0   5980  1508 ?        S    15:48   0:00 crond
nagios    4803  0.0  0.0   5980  1508 ?        S    15:48   0:00 crond
nagios    4806  0.0  0.0   2516   892 ?        Ss   15:48   0:00 /bin/sh -c /usr                             /bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feed                             proc.log 2>&1
nagios    4807  0.0  0.0   2516   892 ?        Ss   15:48   0:00 /bin/sh -c /usr                             /bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/                             perfdataproc.log 2>&1
nagios    4808  0.3  0.5  30480 15584 ?        S    15:48   0:00 /usr/bin/php -q                              /usr/local/nagiosxi/cron/feedproc.php
nagios    4809  0.3  0.5  30728 15748 ?        S    15:48   0:00 /usr/bin/php -q                              /usr/local/nagiosxi/cron/perfdataproc.php
nagios    4814  0.0  0.0   2516   888 ?        Ss   15:48   0:00 /bin/sh -c /usr                             /bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmd                             subsys.log 2>&1
nagios    4815  0.3  0.4  30620 15224 ?        S    15:48   0:00 /usr/bin/php -q                              /usr/local/nagiosxi/cron/cmdsubsys.php
nagios    4816  0.0  0.0   2516   888 ?        Ss   15:48   0:00 /bin/sh -c /usr                             /bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/even                             tman.log 2>&1
nagios    4817  0.4  0.6  36208 20848 ?        S    15:48   0:00 /usr/bin/php -q                              /usr/local/nagiosxi/cron/eventman.php
nagios    4818  0.0  0.0   2516   888 ?        Ss   15:48   0:00 /bin/sh -c /usr                             /bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysst                             at.log 2>&1
nagios    4819  0.3  0.5  30684 16112 ?        S    15:48   0:00 /usr/bin/php -q                              /usr/local/nagiosxi/cron/sysstat.php
postgres  4822  0.1  0.1  21928  4800 ?        S    15:48   0:00 postgres: nagio                             sxi nagiosxi 127.0.0.1(51719) idle
postgres  4824  0.0  0.1  21928  4020 ?        R    15:48   0:00 postgres: nagio                             sxi nagiosxi 127.0.0.1(51721) UPDATE
postgres  4826  0.0  0.1  21928  4360 ?        S    15:48   0:00 postgres: nagio                             sxi nagiosxi 127.0.0.1(51723) idle
postgres  4830  0.0  0.1  21928  3988 ?        S    15:48   0:00 postgres: nagio                             sxi nagiosxi 127.0.0.1(51724) idle
postgres  4843  0.0  0.1  21928  5060 ?        S    15:48   0:00 postgres: nagio                             sxi nagiosxi 127.0.0.1(51727) idle
nagios    5599  0.0  0.0   4168   696 ?        S    15:49   0:00 /usr/local/nagi                             os/libexec/check_nt -H 10.90.18.219 -s  -p 12489 -v COUNTER -l \Paging File(_Tot                             al)\% Usage Paging File usage is %.2f %% -w 85 -c 95
nagios    5603  0.0  0.0   4168   696 ?        S    15:49   0:00 /usr/local/nagi                             os/libexec/check_nt -H 10.90.18.23 -s  -p 12489 -v COUNTER -l \Server\Errors Sys                             tem Login Errors since last reboot is %.f -w 2 -c 20
nagios    5607  0.0  0.0   4168   700 ?        S    15:49   0:00 /usr/local/nagi                             os/libexec/check_nt -H 10.90.18.177 -s  -p 12489 -v COUNTER -l \Paging File(_Tot                             al)\% Usage Paging File usage is %.2f %% -w 85 -c 95
root      5610  0.0  0.0   4020   700 pts/0    S+   15:49   0:00 grep nagios
nagios    6709  0.6  0.1  13584  5640 ?        Ss   07:52   3:16 /usr/local/nagi                             os/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    6710  0.0  0.0   3080   720 ?        S    07:52   0:04 /usr/local/nagi                             os/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    6711  0.0  0.0   3080   724 ?        S    07:52   0:04 /usr/local/nagi                             os/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    6712  0.0  0.0   3080   728 ?        S    07:52   0:04 /usr/local/nagi                             os/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    6713  0.0  0.0   3080   728 ?        S    07:52   0:04 /usr/local/nagi                             os/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    6715  0.0  0.0   3080   720 ?        S    07:52   0:04 /usr/local/nagi                             os/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    6716  0.0  0.0   3080   720 ?        S    07:52   0:04 /usr/local/nagi                             os/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios    6723  0.0  0.0   6932  1164 ?        S    07:52   0:10 /usr/local/nagi                             os/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios    6724  0.4  0.0   8048  2308 ?        S    07:52   2:05 /usr/local/nagi                             os/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios    6765  0.0  0.1  12652  4156 ?        S    07:52   0:00 /usr/local/nagi                             os/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
postgres  8225  0.0  0.2  22000  7464 ?        S    04:35   0:07 postgres: nagio                             sxi nagiosxi 127.0.0.1(46946) idle
postgres 14149  0.0  0.2  22028  7640 ?        S    05:22   0:06 postgres: nagio                             sxi nagiosxi 127.0.0.1(36036) idle
postgres 16808  0.0  0.2  22028  7324 ?        S    07:25   0:04 postgres: nagio                             sxi nagiosxi 127.0.0.1(40160) idle
postgres 20450  0.0  0.1  22000  5140 ?        S    15:27   0:00 postgres: nagio                             sxi nagiosxi 127.0.0.1(56549) idle
postgres 27390  0.0  0.1  22000  5024 ?        S    15:35   0:00 postgres: nagio                             sxi nagiosxi 127.0.0.1(43590) idle
postgres 28943  0.0  0.1  22000  5124 ?        S    15:37   0:00 postgres: nagio                             sxi nagiosxi 127.0.0.1(44471) idle
postgres 32611  0.0  0.2  22000  6900 ?        S    07:44   0:03 postgres: nagio                             sxi nagiosxi 127.0.0.1(39366) idle
php time

Code: Select all

date.timezone = America/Chicago
Clearing Retention.dat

Code: Select all

[root@ltc099l ~]# service nagios stop
Stopping nagios:. done.
[root@ltc099l ~]# killall -9 nagios
nagios: no process killed
[root@ltc099l ~]# rm /usr/local/nagios/var/retention.dat
rm: remove regular file `/usr/local/nagios/var/retention.dat'? y
[root@ltc099l ~]# service nagios start
Starting nagios: done.
Removing the retention.dat file didn't fix the problem.

Thanks for looking into this.

Re: Duration resets for all Services with Apply Config

Posted: Mon Jul 06, 2015 5:02 pm
by jdalrymple
SavaSC - I don't see in the thread where anyone has addressed this...

Is this only during an apply config or does it also happen when the monitoring engine is simply reloaded? I'm wondering if it's inherent to the core process being restarted or if it's happening as a side-effect of the apply-config process.

To test either restart the monitoring engine within the XI interface or from the command line:

Code: Select all

[root@localhost ~]# service nagios restart
Running configuration check...
Stopping nagios:. done.
Starting nagios: done.

Re: Duration resets for all Services with Apply Config

Posted: Tue Jul 07, 2015 7:40 am
by SavaSC
Ah, good catch. I didn't notice that before. It does reset every time the Nagios service is restarted not just on the apply config.

**EDIT**

Just saw this. It appears that some of the times reset and some didn't. All of them acted like they were resetting but some have several hours of down time.

I have attached a screenshot.

Re: Duration resets for all Services with Apply Config

Posted: Tue Jul 07, 2015 9:04 am
by tgriep
Could you upload the following file?
/usr/local/nagios/var/status.dat

Run these commands and post the output.

Code: Select all

tail -100 /usr/local/nagios/var/perfdata.log
tail -100 /usr/local/nagios/var/npcd.log

Re: Duration resets for all Services with Apply Config

Posted: Tue Jul 07, 2015 9:20 am
by SavaSC
perfdata.log

Code: Select all

tail -100 /usr/local/nagios/var/perfdata.log
2015-04-08 18:22:00 [8928] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-08 18:22:00 [8932] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535098.perfdata.service-PID-8932 deleted
2015-04-08 18:22:00 [8932] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-08 18:22:00 [8936] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535113.perfdata.service-PID-8936 deleted
2015-04-08 18:22:00 [8936] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-08 18:22:05 [9000] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-08 18:22:05 [9000] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-08 18:22:05 [9000] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-08 18:22:05 [9006] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-08 18:22:05 [9006] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-08 18:22:05 [9006] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-08 18:22:05 [9000] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535127.perfdata.service-PID-9000 deleted
2015-04-08 18:22:05 [9000] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-08 18:22:06 [9006] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535143.perfdata.service-PID-9006 deleted
2015-04-08 18:22:06 [9006] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:42:41 [17663] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:42:41 [17663] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:42:41 [17663] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:42:41 [17663] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535173.perfdata.service-PID-17663 deleted
2015-04-09 03:42:41 [17663] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:42:41 [17659] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:42:41 [17659] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:42:41 [17659] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:42:41 [17659] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535158.perfdata.service-PID-17659 deleted
2015-04-09 03:42:41 [17659] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:42:41 [17667] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:42:41 [17667] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:42:41 [17667] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:42:41 [17667] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535188.perfdata.service-PID-17667 deleted
2015-04-09 03:42:41 [17667] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:42:46 [17671] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:42:46 [17671] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:42:46 [17671] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:42:46 [17671] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535202.perfdata.service-PID-17671 deleted
2015-04-09 03:42:46 [17671] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:42:51 [17729] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:42:51 [17729] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:42:51 [17729] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:42:51 [17729] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535263.perfdata.service-PID-17729 deleted
2015-04-09 03:42:51 [17729] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:43:20 [17852] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:43:20 [17852] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:43:20 [17852] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:43:20 [17852] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535338.perfdata.service-PID-17852 deleted
2015-04-09 03:43:20 [17852] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:43:20 [17855] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:43:20 [17855] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:43:20 [17855] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:43:20 [17855] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535308.perfdata.service-PID-17855 deleted
2015-04-09 03:43:20 [17855] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:43:25 [17866] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:43:25 [17866] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:43:25 [17866] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:43:25 [17866] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535368.perfdata.service-PID-17866 deleted
2015-04-09 03:43:25 [17866] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:43:30 [17914] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:43:30 [17914] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:43:30 [17914] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:43:30 [17914] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535412.perfdata.service-PID-17914 deleted
2015-04-09 03:43:30 [17914] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 03:43:30 [17913] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 03:43:30 [17913] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 03:43:30 [17913] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 03:43:30 [17913] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428535383.perfdata.service-PID-17913 deleted
2015-04-09 03:43:30 [17913] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 04:51:28 [9527] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 04:51:28 [9527] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 04:51:28 [9527] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 04:51:28 [9527] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428573066.perfdata.service-PID-9527 deleted
2015-04-09 04:51:28 [9527] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 04:53:19 [11016] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 04:53:19 [11016] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 04:53:19 [11016] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 04:53:19 [11016] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428573186.perfdata.service-PID-11016 deleted
2015-04-09 04:53:19 [11016] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 04:53:19 [11014] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 04:53:19 [11014] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 04:53:19 [11014] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 04:53:19 [11014] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428573171.perfdata.service-PID-11014 deleted
2015-04-09 04:53:19 [11014] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 04:54:33 [12060] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 04:54:33 [12060] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 04:54:33 [12060] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 04:54:33 [12060] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428573261.perfdata.service-PID-12060 deleted
2015-04-09 04:54:33 [12060] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 04:54:33 [12058] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 04:54:33 [12058] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 04:54:33 [12058] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 04:54:33 [12058] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428573246.perfdata.service-PID-12058 deleted
2015-04-09 04:54:33 [12058] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 04:55:30 [12875] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 04:55:30 [12875] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 04:55:30 [12875] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 04:55:30 [12875] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428573321.perfdata.service-PID-12875 deleted
2015-04-09 04:55:30 [12875] [0] *** process_perfdata.pl terminated on signal ALRM
2015-04-09 04:55:30 [12873] [0] *** TIMEOUT: Timeout after 5 Sec. ****
2015-04-09 04:55:30 [12873] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2015-04-09 04:55:30 [12873] [0] *** TIMEOUT: Please check your process_perfdata.cfg
2015-04-09 04:55:30 [12873] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//1428573306.perfdata.service-PID-12873 deleted
2015-04-09 04:55:30 [12873] [0] *** process_perfdata.pl terminated on signal ALRM
npcd.log

Code: Select all

# tail -100 /usr/local/nagios/var/npcd.log
[04-09-2015 00:26:12] NPCD: WARN: MAX load reached: load 82.760000/10.000000 at i=18
[04-09-2015 00:26:30] NPCD: WARN: MAX load reached: load 80.370000/10.000000 at i=18
[04-09-2015 00:26:48] NPCD: WARN: MAX load reached: load 76.550000/10.000000 at i=18
[04-09-2015 00:27:22] NPCD: WARN: MAX load reached: load 75.610000/10.000000 at i=18
[04-09-2015 00:28:08] NPCD: WARN: MAX load reached: load 74.580000/10.000000 at i=18
[04-09-2015 00:28:26] NPCD: WARN: MAX load reached: load 76.980000/10.000000 at i=18
[04-09-2015 00:28:45] NPCD: WARN: MAX load reached: load 77.860000/10.000000 at i=18
[04-09-2015 00:29:03] NPCD: WARN: MAX load reached: load 78.730000/10.000000 at i=18
[04-09-2015 00:29:21] NPCD: WARN: MAX load reached: load 79.820000/10.000000 at i=18
[04-09-2015 00:29:43] NPCD: WARN: MAX load reached: load 81.330000/10.000000 at i=18
[04-09-2015 00:30:04] NPCD: WARN: MAX load reached: load 81.150000/10.000000 at i=18
[04-09-2015 00:30:28] NPCD: WARN: MAX load reached: load 81.100000/10.000000 at i=18
[04-09-2015 00:30:56] NPCD: WARN: MAX load reached: load 81.020000/10.000000 at i=18
[04-09-2015 00:31:15] NPCD: WARN: MAX load reached: load 81.160000/10.000000 at i=18
[04-09-2015 00:31:34] NPCD: WARN: MAX load reached: load 83.440000/10.000000 at i=18
[04-09-2015 00:33:56] NPCD: WARN: MAX load reached: load 86.460000/10.000000 at i=18
[04-09-2015 00:44:33] NPCD: WARN: MAX load reached: load 109.080000/10.000000 at i=18
[04-09-2015 00:45:43] NPCD: WARN: MAX load reached: load 110.430000/10.000000 at i=18
[04-09-2015 00:48:48] NPCD: WARN: MAX load reached: load 102.180000/10.000000 at i=18
[04-09-2015 00:49:06] NPCD: WARN: MAX load reached: load 103.850000/10.000000 at i=18
[04-09-2015 00:49:25] NPCD: WARN: MAX load reached: load 103.370000/10.000000 at i=18
[04-09-2015 00:50:07] NPCD: WARN: MAX load reached: load 102.760000/10.000000 at i=18
[04-09-2015 00:51:06] NPCD: WARN: MAX load reached: load 102.630000/10.000000 at i=18
[04-09-2015 01:00:05] NPCD: WARN: MAX load reached: load 110.110000/10.000000 at i=18
[04-09-2015 01:08:03] NPCD: WARN: MAX load reached: load 114.000000/10.000000 at i=18
[04-09-2015 01:19:18] NPCD: WARN: MAX load reached: load 123.320000/10.000000 at i=18
[04-09-2015 01:19:34] NPCD: WARN: MAX load reached: load 113.910000/10.000000 at i=18
[04-09-2015 01:19:57] NPCD: WARN: MAX load reached: load 107.670000/10.000000 at i=18
[04-09-2015 01:26:52] NPCD: WARN: MAX load reached: load 113.530000/10.000000 at i=18
[04-09-2015 01:29:47] NPCD: WARN: MAX load reached: load 124.360000/10.000000 at i=18
[04-09-2015 01:30:08] NPCD: WARN: MAX load reached: load 126.420000/10.000000 at i=18
[04-09-2015 01:30:28] NPCD: WARN: MAX load reached: load 126.660000/10.000000 at i=18
[04-09-2015 01:30:55] NPCD: WARN: MAX load reached: load 126.390000/10.000000 at i=18
[04-09-2015 01:35:00] NPCD: WARN: MAX load reached: load 126.600000/10.000000 at i=18
[04-09-2015 01:58:11] NPCD: WARN: MAX load reached: load 129.530000/10.000000 at i=18
[04-09-2015 02:00:43] NPCD: WARN: MAX load reached: load 134.710000/10.000000 at i=18
[04-09-2015 02:01:33] NPCD: WARN: MAX load reached: load 135.770000/10.000000 at i=18
[04-09-2015 02:01:55] NPCD: WARN: MAX load reached: load 135.040000/10.000000 at i=18
[04-09-2015 02:07:06] NPCD: WARN: MAX load reached: load 136.500000/10.000000 at i=18
[04-09-2015 02:37:38] NPCD: WARN: MAX load reached: load 145.340000/10.000000 at i=18
[04-09-2015 02:38:27] NPCD: WARN: MAX load reached: load 150.410000/10.000000 at i=18
[04-09-2015 02:39:21] NPCD: WARN: MAX load reached: load 150.390000/10.000000 at i=18
[04-09-2015 02:52:29] NPCD: WARN: MAX load reached: load 154.300000/10.000000 at i=18
[04-09-2015 02:55:08] NPCD: WARN: MAX load reached: load 158.460000/10.000000 at i=18
[04-09-2015 03:17:53] NPCD: WARN: MAX load reached: load 160.820000/10.000000 at i=18
[04-09-2015 03:24:52] NPCD: WARN: MAX load reached: load 165.890000/10.000000 at i=18
[04-09-2015 03:35:49] NPCD: WARN: MAX load reached: load 161.160000/10.000000 at i=18
[04-09-2015 03:38:48] NPCD: WARN: MAX load reached: load 156.440000/10.000000 at i=18
[04-09-2015 03:39:06] NPCD: WARN: MAX load reached: load 154.360000/10.000000 at i=18
[04-09-2015 03:39:21] NPCD: WARN: MAX load reached: load 142.470000/10.000000 at i=18
[04-09-2015 03:39:36] NPCD: WARN: MAX load reached: load 119.970000/10.000000 at i=18
[04-09-2015 03:39:51] NPCD: WARN: MAX load reached: load 95.860000/10.000000 at i=18
[04-09-2015 03:40:06] NPCD: WARN: MAX load reached: load 75.580000/10.000000 at i=18
[04-09-2015 03:40:21] NPCD: WARN: MAX load reached: load 59.840000/10.000000 at i=18
[04-09-2015 03:40:36] NPCD: WARN: MAX load reached: load 47.250000/10.000000 at i=18
[04-09-2015 03:40:51] NPCD: WARN: MAX load reached: load 37.700000/10.000000 at i=18
[04-09-2015 03:41:06] NPCD: WARN: MAX load reached: load 30.020000/10.000000 at i=18
[04-09-2015 03:41:21] NPCD: WARN: MAX load reached: load 24.500000/10.000000 at i=18
[04-09-2015 03:41:36] NPCD: WARN: MAX load reached: load 19.740000/10.000000 at i=18
[04-09-2015 03:41:51] NPCD: WARN: MAX load reached: load 16.370000/10.000000 at i=18
[04-09-2015 03:42:06] NPCD: WARN: MAX load reached: load 13.410000/10.000000 at i=18
[04-09-2015 03:42:21] NPCD: WARN: MAX load reached: load 11.640000/10.000000 at i=18
[04-09-2015 03:42:41] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:42:41] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535173.perfdata.service'
[04-09-2015 03:42:41] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:42:41] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535158.perfdata.service'
[04-09-2015 03:42:41] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:42:41] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535188.perfdata.service'
[04-09-2015 03:42:46] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:42:46] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535202.perfdata.service'
[04-09-2015 03:42:51] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:42:51] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535263.perfdata.service'
[04-09-2015 03:43:20] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:43:20] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535338.perfdata.service'
[04-09-2015 03:43:20] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:43:20] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535308.perfdata.service'
[04-09-2015 03:43:25] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:43:25] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535368.perfdata.service'
[04-09-2015 03:43:30] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:43:30] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535412.perfdata.service'
[04-09-2015 03:43:30] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 03:43:30] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428535383.perfdata.service'
[04-09-2015 04:51:28] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 04:51:28] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428573066.perfdata.service'
[04-09-2015 04:53:19] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 04:53:19] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428573186.perfdata.service'
[04-09-2015 04:53:19] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 04:53:19] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428573171.perfdata.service'
[04-09-2015 04:54:33] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 04:54:33] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428573261.perfdata.service'
[04-09-2015 04:54:33] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 04:54:33] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428573246.perfdata.service'
[04-09-2015 04:55:30] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 04:55:30] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428573321.perfdata.service'
[04-09-2015 04:55:30] NPCD: ERROR: Executed command exits with return code '1'
[04-09-2015 04:55:30] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1428573306.perfdata.service'
[06-05-2015 08:31:41] NPCD: Caught Termination Signal - Hasta la vista... baby
[06-05-2015 08:34:02] NPCD: npcd Daemon (0.4.14) started with PID=3792
[06-05-2015 08:34:02] NPCD: Please have a look at 'npcd -V' to get license information
[06-05-2015 08:34:02] NPCD: HINT: load_threshold is enabled - ('10.000000')
I have attached a zip of the /usr/local/nagios/var/status.dat file.

Re: Duration resets for all Services with Apply Config

Posted: Tue Jul 07, 2015 9:28 am
by jdalrymple
SavaSC wrote:

Code: Select all

NPCD: WARN: MAX load reached: load 165.890000/10.000000
Isn't that special.

So it would seem that there are some shortcomings in the resources being made available to your XI server. Does it have a good set of disks to run on? Enough CPU cores available? Have you made any changes recently that would affect the load dramatically? How many hosts/services are you monitoring?

You can partially work around this by adjusting load_threshold in /usr/local/nagios/etc/pnp/npcd.cfg - my guess though is that maybe it's time for you to do a ramdisk.

https://assets.nagios.com/downloads/nag ... giosXI.pdf

Re: Duration resets for all Services with Apply Config

Posted: Tue Jul 07, 2015 10:41 am
by SavaSC
We currently have the following allocated to the NagiosXI server:
- 4 vCPUs
- 4096MB RAM
- 70GB total HD space

According to vSphere, during these service restarts the following resources were used:
- 21% Max CPU usage
- 28% Max memory usage

I have changed the load_theshold = 170.0 and then restarted the Nagios service. The following resources were used in the time during and after the restart:
- 33% Max memory usage
- 35% Max CPU usage

Also, I notice that the errors you reference are from 3 months ago. The last date referenced in the npcd.log is a month ago. The last date referenced in the perfdata.log is 3 months ago. Have these logs gotten too big perhaps? (npcd.log = 2GB perfdata.log = 8GB) Should I clear these out?

Here is the disk usage report:

Code: Select all

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
                      6.7G  3.2G  3.3G  50% /
/dev/sda1              99M   12M   82M  13% /boot
/dev/mapper/nagios_vg00-var_lib_mysql_lv
                      2.0G  398M  1.5G  21% /var/lib/mysql
/dev/mapper/nagios_vg00-var_lib_pgsql_lv
                      2.0G  268M  1.7G  14% /var/lib/pgsql
/dev/mapper/nagios_vg00-usr_local_lv
                      4.0G  3.2G  617M  84% /usr/local
/dev/mapper/nagios_vg00-db_backups_lv
                      6.0G  3.5G  2.2G  62% /store/backups
tmpfs                 1.5G     0  1.5G   0% /dev/shm

Re: Duration resets for all Services with Apply Config

Posted: Tue Jul 07, 2015 10:52 am
by jdalrymple
SavaSC wrote:Also, I notice that the errors you reference are from 3 months ago. The last date referenced in the npcd.log is a month ago. The last date referenced in the perfdata.log is 3 months ago.
Ha - sorry about that. I didn't bother looking at the date.
SavaSC wrote:Have these logs gotten too big perhaps? (npcd.log = 2GB perfdata.log = 8GB) Should I clear these out?
No - not related. We're barking up the wrong tree I'm afraid. Although - I do wonder what in perfdata.log is consuming 8GB of disk space. You can clear that out safely.

Back to the drawing board, what shows up in /usr/local/nagios/var/nagios.log at the time of a monitoring engine reload? Anything interesting?