This support forum board is for support questions relating to
Nagios XI , our flagship commercial network monitoring solution.
CFT6Server
Posts: 506 Joined: Wed Apr 15, 2015 4:21 pm
Post
by CFT6Server » Tue Aug 25, 2015 1:50 pm
So after restarting the graphs are still no producing. I want to figure out what is the root cause before rebooting the server. Any ideas?
Also seeing these errors in the nagios.log
Code: Select all
[1440528808] Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1440528808.perfdata.host"
[1440528808] Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1440528808.perfdata.service"
I am seeing these repeat.
jdalrymple
Skynet Drone
Posts: 2620 Joined: Wed Feb 11, 2015 1:56 pm
Post
by jdalrymple » Tue Aug 25, 2015 1:58 pm
CFT6Server wrote: Warning: fork() in my_system_r() failed for command
Sounds like potentially hitting a ulimit or a memory exhaustion issue.
Probably would be worthwhile to get a roundabout idea of your nagios process count and your memory usage:
Code: Select all
[root@localhost ~]# lsof | grep "^nagios" | wc -l
124
[root@limits ~]# cat /proc/`cat /usr/local/nagios/var/nagios.lock`/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 10485760 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 30385 30385 processes
Max open files 8192 8192 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 30385 30385 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
[root@localhost ~]# free
total used free shared buffers cached
Mem: 3908740 3195252 713488 28364 153964 2230408
-/+ buffers/cache: 810880 3097860
Swap: 2031612 0 2031612
tgriep
Madmin
Posts: 9190 Joined: Thu Oct 30, 2014 9:02 am
Post
by tgriep » Tue Aug 25, 2015 2:01 pm
Can you run the following and post back the output?
Code: Select all
ls -l /usr/local/nagios/var/
ls -l /usr/local/nagios/var/spool
Be sure to check out our
Knowledgebase for helpful articles and solutions!
CFT6Server
Posts: 506 Joined: Wed Apr 15, 2015 4:21 pm
Post
by CFT6Server » Tue Aug 25, 2015 2:20 pm
Code: Select all
# lsof | grep "^nagios" | wc -l
196
Code: Select all
# cat /proc/`cat /usr/local/nagios/var/nagios.lock`/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 10485760 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 46647 46647 processes
Max open files 4096 4096 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 46647 46647 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
Code: Select all
# free
total used free shared buffers cached
Mem: 5992380 5649632 342748 34584 194440 2076016
-/+ buffers/cache: 3379176 2613204
Swap: 2064380 2064380 0
Code: Select all
# ls -l /usr/local/nagios/var/
total 442184
drwxrwxr-x 2 nagios nagios 12288 Aug 25 00:00 archives
-rw-r--r-- 1 nagios nagios 28273674 Aug 25 12:15 host-perfdata
-rw-r--r-- 1 nagios nagios 775203 Aug 25 09:33 nagios.configtest
-rw-r--r-- 1 nagios nagios 6 Aug 21 15:05 nagios.lock
-rw-r--r-- 1 nagios nagios 7011869 Aug 25 12:15 nagios.log
-rw------- 1 nagios nagios 0 Jul 13 13:47 nagios.tmp77VgjH
-rw------- 1 nagios nagios 13989 Jul 13 14:58 nagios.tmpfFrDVQ
-rw------- 1 nagios nagios 292502 Jul 13 15:05 nagios.tmprtAOzi
-rw------- 1 nagios nagios 3767 Jul 13 14:16 nagios.tmpZ3NLEl
-rw-r--r-- 1 nagios nagios 5 Aug 21 09:10 ndo2db.lock
-rw-r--r-- 1 nagios nagios 0 Aug 21 15:05 ndomod.tmp
srwxr-xr-x 1 nagios nagios 0 Aug 21 09:10 ndo.sock
-rw-r--r-- 1 nagios nagios 5650754 Aug 25 10:55 npcd.log
-rw-r--r-- 1 nagios nagios 20781011 Aug 21 15:05 objects.cache
-rw-r--r-- 1 nagios nagios 20781011 Aug 25 09:33 objects.precache
-rw-rw-r-- 1 nagios nagios 344839 Aug 25 02:39 perfdata.log
-rw------- 1 nagios nagios 32669787 Aug 25 12:05 retention.dat
drwxrwsr-x 2 nagios nagcmd 4096 Aug 21 15:05 rw
-rw-r--r-- 1 nagios nagios 303525825 Aug 25 12:15 service-perfdata
drwxr-xr-x 5 nagios nagios 4096 Feb 24 09:26 spool
drwxr-xr-x 2 nagios nagios 4096 Aug 25 02:40 stats
-rw-rw-r-- 1 nagios nagios 32479310 Aug 25 12:15 status.dat
-rw-r--r-- 1 root root 105675 Jul 16 16:58 wmitest.txt
Code: Select all
# ls -l /usr/local/nagios/var/spool
total 340
drwxrwsr-x 2 nagios nagios 4096 Aug 25 12:12 checkresults
drwxr-xr-x 2 nagios nagios 335872 Aug 25 02:40 perfdata
drwxr-xr-x 2 nagios nagios 4096 Aug 25 02:40 xidpe
Think we'll definitely need to bump the memory utilization on here.
tgriep
Madmin
Posts: 9190 Joined: Thu Oct 30, 2014 9:02 am
Post
by tgriep » Tue Aug 25, 2015 2:43 pm
The service-perfdata and the host-perfdata files are very large.
Lets try and restart nagios to see if it will start processing them. Run the following.
Code: Select all
service nagios stop
killall -9 nagios
service nagios start
If that doesn't work, you may have to clear those files out and restart nagios.
Can you run the following and post the output?
Code: Select all
grep perfdata /usr/local/nagios/etc/nagios.cfg
Be sure to check out our
Knowledgebase for helpful articles and solutions!
CFT6Server
Posts: 506 Joined: Wed Apr 15, 2015 4:21 pm
Post
by CFT6Server » Tue Aug 25, 2015 3:35 pm
Just restarted and looks like the files cleared. Here are the outputs.
Code: Select all
# grep perfdata /usr/local/nagios/etc/nagios.cfg
service_perfdata_file=/usr/local/nagios/var/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file-bulk
host_perfdata_file=/usr/local/nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tHOSTOUTPUT::$HOSTOUTPUT$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file-bulk
perfdata_timeout=5
Code: Select all
# ls -l /usr/local/nagios/var/
total 119820
drwxrwxr-x 2 nagios nagios 12288 Aug 25 00:00 archives
-rw-r--r-- 1 nagios nagios 0 Aug 25 13:35 host-perfdata
-rw-r--r-- 1 nagios nagios 775203 Aug 25 13:33 nagios.configtest
-rw-r--r-- 1 nagios nagios 6 Aug 25 13:33 nagios.lock
-rw-r--r-- 1 nagios nagios 9033916 Aug 25 13:35 nagios.log
-rw------- 1 nagios nagios 0 Jul 13 13:47 nagios.tmp77VgjH
-rw------- 1 nagios nagios 13989 Jul 13 14:58 nagios.tmpfFrDVQ
-rw------- 1 nagios nagios 292502 Jul 13 15:05 nagios.tmprtAOzi
-rw------- 1 nagios nagios 3767 Jul 13 14:16 nagios.tmpZ3NLEl
-rw-r--r-- 1 nagios nagios 5 Aug 21 09:10 ndo2db.lock
-rw-r--r-- 1 nagios nagios 0 Aug 25 13:32 ndomod.tmp
srwxr-xr-x 1 nagios nagios 0 Aug 21 09:10 ndo.sock
-rw-r--r-- 1 nagios nagios 5651755 Aug 25 13:34 npcd.log
-rw-r--r-- 1 nagios nagios 20781011 Aug 25 13:33 objects.cache
-rw-r--r-- 1 nagios nagios 20781011 Aug 25 13:33 objects.precache
-rw-rw-r-- 1 nagios nagios 347069 Aug 25 13:34 perfdata.log
-rw------- 1 nagios nagios 32517978 Aug 25 13:33 retention.dat
drwxrwsr-x 2 nagios nagcmd 4096 Aug 25 13:33 rw
-rw-r--r-- 1 nagios nagios 0 Aug 25 13:35 service-perfdata
drwxr-xr-x 5 nagios nagios 4096 Feb 24 09:26 spool
drwxr-xr-x 2 nagios nagios 4096 Aug 25 13:34 stats
-rw-rw-r-- 1 nagios nagios 32321053 Aug 25 13:35 status.dat
-rw-r--r-- 1 root root 105675 Jul 16 16:58 wmitest.txt
Code: Select all
# ls -l /usr/local/nagios/var/spool
total 340
drwxrwsr-x 2 nagios nagios 4096 Aug 25 13:33 checkresults
drwxr-xr-x 2 nagios nagios 335872 Aug 25 13:35 perfdata
drwxr-xr-x 2 nagios nagios 4096 Aug 25 13:35 xidpe
I am still waiting to see if the performance graphs are coming back or not. So far the gap is still there for anything that's missed during this "stuck" period.
CFT6Server
Posts: 506 Joined: Wed Apr 15, 2015 4:21 pm
Post
by CFT6Server » Tue Aug 25, 2015 4:27 pm
Looks like I am still seeing some errors.... but the performance graphs are coming back.
Code: Select all
==> npcd.log <==
[08-25-2015 13:42:40] NPCD: ERROR: Executed command exits with return code '7'
[08-25-2015 13:42:40] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1440535300.perfdata.service'
Box293
Too Basu
Posts: 5126 Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:
Post
by Box293 » Tue Aug 25, 2015 6:30 pm
To get to the bottom of those return code 7 errors I think you need to enable debugging to get more information.
http://support.nagios.com/wiki/index.ph ... leshooting
Don't forget to turn down the log level as per the FAQ when you are done!
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new
Privacy Policy .
CFT6Server
Posts: 506 Joined: Wed Apr 15, 2015 4:21 pm
Post
by CFT6Server » Wed Aug 26, 2015 12:12 pm
I have turned on debug but looks like it is no longer producing the error 7. Perhaps the restart of the service fixed the issue temporarily. I will keep this running to monitor the issue.
hsmith
Agent Smith
Posts: 3539 Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:
Post
by hsmith » Wed Aug 26, 2015 12:14 pm
CFT6Server wrote: I have turned on debug but looks like it is no longer producing the error 7. Perhaps the restart of the service fixed the issue temporarily. I will keep this running to monitor the issue.
Thanks, let us know what happens.
Former Nagios Employee.
me.