missing bandwidth perf data for network devices

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: missing bandwidth perf data for network devices

Post by rkennedy »

How's the regular disk usage looking? (not just inodes)

This plugin should work for monitoring the file - https://exchange.nagios.org/directory/A ... nt/details

From there, you could use event_handlers to trigger a bash script that checks out all possible thresholds and outputs it to a file.

Just to clarify - is this happening just for one service, or to your whole system?
Former Nagios Employee
brdr
Posts: 312
Joined: Mon Jun 02, 2014 12:49 pm

Re: missing bandwidth perf data for network devices

Post by brdr »

Thanks.

I first noticed it was happening to network devices on bandwidth. The latest issue (30-Dec) is system wide (all service checks and host checks stopped graphing).
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: missing bandwidth perf data for network devices

Post by ssax »

Are you running gearman on this server?

Also, I found this thread here: https://support.nagios.com/forum/viewto ... 57#p150057
jdalrymple wrote:
CFT6Server wrote:Warning: fork() in my_system_r() failed for command
Sounds like potentially hitting a ulimit or a memory exhaustion issue.

Probably would be worthwhile to get a roundabout idea of your nagios process count and your memory usage:

Code: Select all

[root@localhost ~]# lsof | grep "^nagios" | wc -l
124
[root@limits ~]# cat /proc/`cat /usr/local/nagios/var/nagios.lock`/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            10485760             unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             30385                30385                processes
Max open files            8192                 8192                 files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       30385                30385                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
[root@localhost ~]# free
             total       used       free     shared    buffers     cached
Mem:       3908740    3195252     713488      28364     153964    2230408
-/+ buffers/cache:     810880    3097860
Swap:      2031612          0    2031612
brdr
Posts: 312
Joined: Mon Jun 02, 2014 12:49 pm

Re: missing bandwidth perf data for network devices

Post by brdr »

thanks ssax. Yes, we do run gearman on XI server using 2 worker servers. Definately agree with hitting ulimit, perhaps with 'max user processes' or 'open files'. I have attached ulimits (soft) for Nagios user below. Right now, user processes count for nagios is around 325, and open file count is around 2200.

Once i detect the pattern is showing up in nagios.log i can check limits.

Since the reboot on 31-dec we have not seen this issue.
[root@bed-600-124 archives]# grep -l my_system_r *.log
nagios-01-01-2016-00.log
nagios-06-18-2015-00.log
nagios-06-19-2015-00.log
nagios-06-20-2015-00.log
nagios-12-09-2015-00.log
nagios-12-10-2015-00.log
nagios-12-31-2015-00.log


[root@bed-600-124 archives]# su - nagios
[nagios@bed-600-124 ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 95123
max locked memory (kbytes, -l) 128
max memory size (kbytes, -m) unlimited
open files (-n) 4096 (HARD LIMIT is also 4096)
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 20480
cpu time (seconds, -t) unlimited
max user processes (-u) 1024 (HARD LIMIT is 4096)
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: missing bandwidth perf data for network devices

Post by tmcdonald »

Yea, fork problems for sure reek of resource limits being hit. How long do you estimate it will take for this to appear again?
Former Nagios employee
Locked