missing bandwidth perf data for network devices

brdr · Post by **brdr** » Thu Dec 17, 2015 4:06 pm

Hi,

We are using XI5.2.2. When looking at bandwidth graph for our net devices we are missing data for about 5 hours. We have another tool that monitors bandwidth and can see fair amount of bandwidth in this tool for same port.

All the server check, like CPU, Memory, disk do not have the gap. Could this be MRTG related? If so, where should I look to troubleshoot.

Thx

rkennedy · Post by **rkennedy** » Fri Dec 18, 2015 11:28 am

Just to verify, are these checks running over SNMP?

How many service / host checks are running, and what kind of resources do you have allocated to this machine?

brdr · Post by **brdr** » Fri Dec 18, 2015 1:17 pm

Yes, all SNMP.

We have plenty of spare resources

1 XI server w/8 CPU 12 GB Mem, and 2 Mod Gearman Worker Servers each 4 CPU 4 GB Memory.

940 Hosts / 6686 Services. At any one time we may have 10 alerts active.

As continue to look at last 7 days i see gaps of missing performance data for bandwidth for ports. Ugh.

brdr · Post by **brdr** » Fri Dec 18, 2015 2:35 pm

Hold on this one plz. Let me do some diggin....

Keep you posted. Thx.

jolson · Post by **jolson** » Fri Dec 18, 2015 2:54 pm

Sounds good - let us know what you find out.

Post by **tgriep** » Fri Dec 18, 2015 2:56 pm

The bandwidth graph for network devices are run by the Cron Daemon.
Take a look in the /var/log/cron log file to see if there are any clues as to why this happened.

brdr · Post by **brdr** » Mon Jan 04, 2016 10:39 am

Hello,

This issue (no perf data/no graphing) happened again on 30-Dec. This issue is related to previous post at:

https://support.nagios.com/forum/viewto ... 16&t=36131

I have looked at the cron logs and nothing sticks out as a problem.

When this entry in the nagios.log shows up perf data stops, and graphing goes south. This is third time in 3 weeks this issue happened. The first time it happened we were using 2.7, the last 2 times it was XI 5.2.2. I rebooted the XI server and things got back to normal.
[Thu Dec 31 00:00:00 2015] Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1451538000.perfdata.service"

Do you know which Nagios program/component is writing this error message to the nagios.log? If i increase the nagios logging would failed fork() call indicate a return code which might tell us why it couldn't fork to begin with?

Thanks

rkennedy · Post by **rkennedy** » Mon Jan 04, 2016 12:11 pm

Continuing from the old post, are your inodes maxing out? I wonder if other resources are hitting limits as well.

What is the output of df -h and top|head -5 once again?

brdr · Post by **brdr** » Mon Jan 04, 2016 1:27 pm

I thought it might be inodes. So, couple weeks back i setup inode checks against the file systems to alert me when inode % is above 65%. inode usage was low - no alerts. I believe your right about hitting limits.

[root@bed-600-124 var]# df -ih
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/VolGroup00-LogVol00
256K 47K 210K 19% /
tmpfs 1.5M 1 1.5M 1% /dev/shm
/dev/sda1 25K 50 25K 1% /boot
/dev/mapper/VolGroup00-LogVol05
192K 849 192K 1% /home
/dev/mapper/VolGroup00-LogVol02
8.2M 160K 8.1M 2% /usr
/dev/mapper/VolGroup00-LogVol03
384K 21K 364K 6% /var
[root@bed-600-124 var]#

top - 12:50:58 up 4 days, 1:04, 1 user, load average: 1.49, 1.93, 1.86
Tasks: 353 total, 1 running, 351 sleeping, 0 stopped, 1 zombie
Cpu(s): 27.6%us, 5.8%sy, 0.0%ni, 65.8%id, 0.4%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 12197752k total, 9136892k used, 3060860k free, 147212k buffers
Swap: 2064380k total, 70632k used, 1993748k free, 2591220k cached

brdr · Post by **brdr** » Mon Jan 04, 2016 1:42 pm

Is there a plugin (or script) that is already created that checks the presence of a string(patter) in the /usr/local/nagios/var/nagios.log?

I could setup a check in Nagios to check for pattern "Warning: fork() in my_system_r" in this file. If the pattern shows up, warm me, then I know I will need to run some commands to try and determine what limit it is hitting. I would also run a simple 'C' program as Nagios that calls a fork and check errno on the return for more info.

Nagios Support Forum

missing bandwidth perf data for network devices

missing bandwidth perf data for network devices

Re: missing bandwidth perf data for network devices

Re: missing bandwidth perf data for network devices

Re: missing bandwidth perf data for network devices

Re: missing bandwidth perf data for network devices

Re: missing bandwidth perf data for network devices

Re: missing bandwidth perf data for network devices

Re: missing bandwidth perf data for network devices

Re: missing bandwidth perf data for network devices

Re: missing bandwidth perf data for network devices