Hi,
We are using XI5.2.2. When looking at bandwidth graph for our net devices we are missing data for about 5 hours. We have another tool that monitors bandwidth and can see fair amount of bandwidth in this tool for same port.
All the server check, like CPU, Memory, disk do not have the gap. Could this be MRTG related? If so, where should I look to troubleshoot.
Thx
missing bandwidth perf data for network devices
Re: missing bandwidth perf data for network devices
Just to verify, are these checks running over SNMP?
How many service / host checks are running, and what kind of resources do you have allocated to this machine?
How many service / host checks are running, and what kind of resources do you have allocated to this machine?
Former Nagios Employee
Re: missing bandwidth perf data for network devices
Yes, all SNMP.
We have plenty of spare resources
1 XI server w/8 CPU 12 GB Mem, and 2 Mod Gearman Worker Servers each 4 CPU 4 GB Memory.
940 Hosts / 6686 Services. At any one time we may have 10 alerts active.
As continue to look at last 7 days i see gaps of missing performance data for bandwidth for ports. Ugh.
We have plenty of spare resources
1 XI server w/8 CPU 12 GB Mem, and 2 Mod Gearman Worker Servers each 4 CPU 4 GB Memory.
940 Hosts / 6686 Services. At any one time we may have 10 alerts active.
As continue to look at last 7 days i see gaps of missing performance data for bandwidth for ports. Ugh.
Re: missing bandwidth perf data for network devices
Hold on this one plz. Let me do some diggin....
Keep you posted. Thx.
Keep you posted. Thx.
Re: missing bandwidth perf data for network devices
Sounds good - let us know what you find out.
Re: missing bandwidth perf data for network devices
The bandwidth graph for network devices are run by the Cron Daemon.
Take a look in the /var/log/cron log file to see if there are any clues as to why this happened.
Take a look in the /var/log/cron log file to see if there are any clues as to why this happened.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: missing bandwidth perf data for network devices
Hello,
This issue (no perf data/no graphing) happened again on 30-Dec. This issue is related to previous post at:
https://support.nagios.com/forum/viewto ... 16&t=36131
I have looked at the cron logs and nothing sticks out as a problem.
When this entry in the nagios.log shows up perf data stops, and graphing goes south. This is third time in 3 weeks this issue happened. The first time it happened we were using 2.7, the last 2 times it was XI 5.2.2. I rebooted the XI server and things got back to normal.
[Thu Dec 31 00:00:00 2015] Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1451538000.perfdata.service"
Do you know which Nagios program/component is writing this error message to the nagios.log? If i increase the nagios logging would failed fork() call indicate a return code which might tell us why it couldn't fork to begin with?
Thanks
This issue (no perf data/no graphing) happened again on 30-Dec. This issue is related to previous post at:
https://support.nagios.com/forum/viewto ... 16&t=36131
I have looked at the cron logs and nothing sticks out as a problem.
When this entry in the nagios.log shows up perf data stops, and graphing goes south. This is third time in 3 weeks this issue happened. The first time it happened we were using 2.7, the last 2 times it was XI 5.2.2. I rebooted the XI server and things got back to normal.
[Thu Dec 31 00:00:00 2015] Warning: fork() in my_system_r() failed for command "/bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/xidpe/1451538000.perfdata.service"
Do you know which Nagios program/component is writing this error message to the nagios.log? If i increase the nagios logging would failed fork() call indicate a return code which might tell us why it couldn't fork to begin with?
Thanks
Re: missing bandwidth perf data for network devices
Continuing from the old post, are your inodes maxing out? I wonder if other resources are hitting limits as well.
What is the output of df -h and top|head -5 once again?
What is the output of df -h and top|head -5 once again?
Former Nagios Employee
Re: missing bandwidth perf data for network devices
I thought it might be inodes. So, couple weeks back i setup inode checks against the file systems to alert me when inode % is above 65%. inode usage was low - no alerts. I believe your right about hitting limits.
[root@bed-600-124 var]# df -ih
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/VolGroup00-LogVol00
256K 47K 210K 19% /
tmpfs 1.5M 1 1.5M 1% /dev/shm
/dev/sda1 25K 50 25K 1% /boot
/dev/mapper/VolGroup00-LogVol05
192K 849 192K 1% /home
/dev/mapper/VolGroup00-LogVol02
8.2M 160K 8.1M 2% /usr
/dev/mapper/VolGroup00-LogVol03
384K 21K 364K 6% /var
[root@bed-600-124 var]#
top - 12:50:58 up 4 days, 1:04, 1 user, load average: 1.49, 1.93, 1.86
Tasks: 353 total, 1 running, 351 sleeping, 0 stopped, 1 zombie
Cpu(s): 27.6%us, 5.8%sy, 0.0%ni, 65.8%id, 0.4%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 12197752k total, 9136892k used, 3060860k free, 147212k buffers
Swap: 2064380k total, 70632k used, 1993748k free, 2591220k cached
[root@bed-600-124 var]# df -ih
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/VolGroup00-LogVol00
256K 47K 210K 19% /
tmpfs 1.5M 1 1.5M 1% /dev/shm
/dev/sda1 25K 50 25K 1% /boot
/dev/mapper/VolGroup00-LogVol05
192K 849 192K 1% /home
/dev/mapper/VolGroup00-LogVol02
8.2M 160K 8.1M 2% /usr
/dev/mapper/VolGroup00-LogVol03
384K 21K 364K 6% /var
[root@bed-600-124 var]#
top - 12:50:58 up 4 days, 1:04, 1 user, load average: 1.49, 1.93, 1.86
Tasks: 353 total, 1 running, 351 sleeping, 0 stopped, 1 zombie
Cpu(s): 27.6%us, 5.8%sy, 0.0%ni, 65.8%id, 0.4%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 12197752k total, 9136892k used, 3060860k free, 147212k buffers
Swap: 2064380k total, 70632k used, 1993748k free, 2591220k cached
Re: missing bandwidth perf data for network devices
Is there a plugin (or script) that is already created that checks the presence of a string(patter) in the /usr/local/nagios/var/nagios.log?
I could setup a check in Nagios to check for pattern "Warning: fork() in my_system_r" in this file. If the pattern shows up, warm me, then I know I will need to run some commands to try and determine what limit it is hitting. I would also run a simple 'C' program as Nagios that calls a fork and check errno on the return for more info.
I could setup a check in Nagios to check for pattern "Warning: fork() in my_system_r" in this file. If the pattern shows up, warm me, then I know I will need to run some commands to try and determine what limit it is hitting. I would also run a simple 'C' program as Nagios that calls a fork and check errno on the return for more info.