All Performance Graphs Blank

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: All Performance Graphs Blank

Post by CFT6Server »

So after restarting the graphs are still no producing. I want to figure out what is the root cause before rebooting the server. Any ideas?

Also seeing these errors in the nagios.log

Code: Select all

[1440528808] Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1440528808.perfdata.host"
[1440528808] Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1440528808.perfdata.service"
I am seeing these repeat.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: All Performance Graphs Blank

Post by jdalrymple »

CFT6Server wrote:Warning: fork() in my_system_r() failed for command
Sounds like potentially hitting a ulimit or a memory exhaustion issue.

Probably would be worthwhile to get a roundabout idea of your nagios process count and your memory usage:

Code: Select all

[root@localhost ~]# lsof | grep "^nagios" | wc -l
124
[root@limits ~]# cat /proc/`cat /usr/local/nagios/var/nagios.lock`/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            10485760             unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             30385                30385                processes
Max open files            8192                 8192                 files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       30385                30385                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
[root@localhost ~]# free
             total       used       free     shared    buffers     cached
Mem:       3908740    3195252     713488      28364     153964    2230408
-/+ buffers/cache:     810880    3097860
Swap:      2031612          0    2031612
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: All Performance Graphs Blank

Post by tgriep »

Can you run the following and post back the output?

Code: Select all

ls -l /usr/local/nagios/var/
ls -l /usr/local/nagios/var/spool
Be sure to check out our Knowledgebase for helpful articles and solutions!
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: All Performance Graphs Blank

Post by CFT6Server »

Code: Select all

# lsof | grep "^nagios" | wc -l
196

Code: Select all

# cat /proc/`cat /usr/local/nagios/var/nagios.lock`/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            10485760             unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             46647                46647                processes
Max open files            4096                 4096                 files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       46647                46647                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

Code: Select all

# free
             total       used       free     shared    buffers     cached
Mem:       5992380    5649632     342748      34584     194440    2076016
-/+ buffers/cache:    3379176    2613204
Swap:      2064380    2064380          0

Code: Select all

# ls -l /usr/local/nagios/var/
total 442184
drwxrwxr-x 2 nagios nagios     12288 Aug 25 00:00 archives
-rw-r--r-- 1 nagios nagios  28273674 Aug 25 12:15 host-perfdata
-rw-r--r-- 1 nagios nagios    775203 Aug 25 09:33 nagios.configtest
-rw-r--r-- 1 nagios nagios         6 Aug 21 15:05 nagios.lock
-rw-r--r-- 1 nagios nagios   7011869 Aug 25 12:15 nagios.log
-rw------- 1 nagios nagios         0 Jul 13 13:47 nagios.tmp77VgjH
-rw------- 1 nagios nagios     13989 Jul 13 14:58 nagios.tmpfFrDVQ
-rw------- 1 nagios nagios    292502 Jul 13 15:05 nagios.tmprtAOzi
-rw------- 1 nagios nagios      3767 Jul 13 14:16 nagios.tmpZ3NLEl
-rw-r--r-- 1 nagios nagios         5 Aug 21 09:10 ndo2db.lock
-rw-r--r-- 1 nagios nagios         0 Aug 21 15:05 ndomod.tmp
srwxr-xr-x 1 nagios nagios         0 Aug 21 09:10 ndo.sock
-rw-r--r-- 1 nagios nagios   5650754 Aug 25 10:55 npcd.log
-rw-r--r-- 1 nagios nagios  20781011 Aug 21 15:05 objects.cache
-rw-r--r-- 1 nagios nagios  20781011 Aug 25 09:33 objects.precache
-rw-rw-r-- 1 nagios nagios    344839 Aug 25 02:39 perfdata.log
-rw------- 1 nagios nagios  32669787 Aug 25 12:05 retention.dat
drwxrwsr-x 2 nagios nagcmd      4096 Aug 21 15:05 rw
-rw-r--r-- 1 nagios nagios 303525825 Aug 25 12:15 service-perfdata
drwxr-xr-x 5 nagios nagios      4096 Feb 24 09:26 spool
drwxr-xr-x 2 nagios nagios      4096 Aug 25 02:40 stats
-rw-rw-r-- 1 nagios nagios  32479310 Aug 25 12:15 status.dat
-rw-r--r-- 1 root   root      105675 Jul 16 16:58 wmitest.txt

Code: Select all

# ls -l /usr/local/nagios/var/spool
total 340
drwxrwsr-x 2 nagios nagios   4096 Aug 25 12:12 checkresults
drwxr-xr-x 2 nagios nagios 335872 Aug 25 02:40 perfdata
drwxr-xr-x 2 nagios nagios   4096 Aug 25 02:40 xidpe
Think we'll definitely need to bump the memory utilization on here.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: All Performance Graphs Blank

Post by tgriep »

The service-perfdata and the host-perfdata files are very large.
Lets try and restart nagios to see if it will start processing them. Run the following.

Code: Select all

service nagios stop
killall -9 nagios
service nagios start
If that doesn't work, you may have to clear those files out and restart nagios.
Can you run the following and post the output?

Code: Select all

grep perfdata /usr/local/nagios/etc/nagios.cfg
Be sure to check out our Knowledgebase for helpful articles and solutions!
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: All Performance Graphs Blank

Post by CFT6Server »

Just restarted and looks like the files cleared. Here are the outputs.

Code: Select all

# grep perfdata /usr/local/nagios/etc/nagios.cfg
service_perfdata_file=/usr/local/nagios/var/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file-bulk
host_perfdata_file=/usr/local/nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tHOSTOUTPUT::$HOSTOUTPUT$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file-bulk
perfdata_timeout=5

Code: Select all

# ls -l /usr/local/nagios/var/
total 119820
drwxrwxr-x 2 nagios nagios    12288 Aug 25 00:00 archives
-rw-r--r-- 1 nagios nagios        0 Aug 25 13:35 host-perfdata
-rw-r--r-- 1 nagios nagios   775203 Aug 25 13:33 nagios.configtest
-rw-r--r-- 1 nagios nagios        6 Aug 25 13:33 nagios.lock
-rw-r--r-- 1 nagios nagios  9033916 Aug 25 13:35 nagios.log
-rw------- 1 nagios nagios        0 Jul 13 13:47 nagios.tmp77VgjH
-rw------- 1 nagios nagios    13989 Jul 13 14:58 nagios.tmpfFrDVQ
-rw------- 1 nagios nagios   292502 Jul 13 15:05 nagios.tmprtAOzi
-rw------- 1 nagios nagios     3767 Jul 13 14:16 nagios.tmpZ3NLEl
-rw-r--r-- 1 nagios nagios        5 Aug 21 09:10 ndo2db.lock
-rw-r--r-- 1 nagios nagios        0 Aug 25 13:32 ndomod.tmp
srwxr-xr-x 1 nagios nagios        0 Aug 21 09:10 ndo.sock
-rw-r--r-- 1 nagios nagios  5651755 Aug 25 13:34 npcd.log
-rw-r--r-- 1 nagios nagios 20781011 Aug 25 13:33 objects.cache
-rw-r--r-- 1 nagios nagios 20781011 Aug 25 13:33 objects.precache
-rw-rw-r-- 1 nagios nagios   347069 Aug 25 13:34 perfdata.log
-rw------- 1 nagios nagios 32517978 Aug 25 13:33 retention.dat
drwxrwsr-x 2 nagios nagcmd     4096 Aug 25 13:33 rw
-rw-r--r-- 1 nagios nagios        0 Aug 25 13:35 service-perfdata
drwxr-xr-x 5 nagios nagios     4096 Feb 24 09:26 spool
drwxr-xr-x 2 nagios nagios     4096 Aug 25 13:34 stats
-rw-rw-r-- 1 nagios nagios 32321053 Aug 25 13:35 status.dat
-rw-r--r-- 1 root   root     105675 Jul 16 16:58 wmitest.txt

Code: Select all

# ls -l /usr/local/nagios/var/spool
total 340
drwxrwsr-x 2 nagios nagios   4096 Aug 25 13:33 checkresults
drwxr-xr-x 2 nagios nagios 335872 Aug 25 13:35 perfdata
drwxr-xr-x 2 nagios nagios   4096 Aug 25 13:35 xidpe
I am still waiting to see if the performance graphs are coming back or not. So far the gap is still there for anything that's missed during this "stuck" period.
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: All Performance Graphs Blank

Post by CFT6Server »

Looks like I am still seeing some errors.... but the performance graphs are coming back.

Code: Select all

==> npcd.log <==
[08-25-2015 13:42:40] NPCD: ERROR: Executed command exits with return code '7'
[08-25-2015 13:42:40] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1440535300.perfdata.service'
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: All Performance Graphs Blank

Post by Box293 »

To get to the bottom of those return code 7 errors I think you need to enable debugging to get more information.

http://support.nagios.com/wiki/index.ph ... leshooting

Don't forget to turn down the log level as per the FAQ when you are done!
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
CFT6Server
Posts: 506
Joined: Wed Apr 15, 2015 4:21 pm

Re: All Performance Graphs Blank

Post by CFT6Server »

I have turned on debug but looks like it is no longer producing the error 7. Perhaps the restart of the service fixed the issue temporarily. I will keep this running to monitor the issue.
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: All Performance Graphs Blank

Post by hsmith »

CFT6Server wrote:I have turned on debug but looks like it is no longer producing the error 7. Perhaps the restart of the service fixed the issue temporarily. I will keep this running to monitor the issue.
Thanks, let us know what happens.
Former Nagios Employee.
me.
Locked