Page 1 of 2
check_rrdtraf sometimes returns 0 network traffic
Posted: Mon Nov 07, 2016 2:42 pm
by WillemDH
Hello,
All of a sudden our services using check_rrdtraf are sporadically returning 0 Mbps in and 0 Mbps out. Is there any known issue which could cause such a behaviour? Please note we have offloaded our mrtg checks to a gearman2 worker node. This has been working for 4 years now and never gave any big issue (except for the memory leak in mod gearman 1 whcih we solved by upgrading gearman)
I did some actions trying to resolve this, as I have the feeling mrtg is hitting some limit.
1) Process limit
/etc/security/limits.d/90-nproc.conf
* soft nproc 1024
to
* soft nproc 4096
2) File limit
Before:
ulimit –Sn
1024
/etc/security/limits.conf
Added:
* soft nofile 4096
After:
ulimit -Sn
4096
Things seem better now, but I'm curious if Nagios support has seen this behaviour before.
Grtz
WIllem
Re: check_rrdtraf sometimes returns 0 network traffic
Posted: Mon Nov 07, 2016 2:54 pm
by avandemore
This is a known issue which is fixed in 5.3.2 if you're running RHEL/CentOS 7. Is this your environment?
Re: check_rrdtraf sometimes returns 0 network traffic
Posted: Mon Nov 07, 2016 4:18 pm
by WillemDH
No both my XI and mrtg worker node are on CentOS 6. Just seeing that my file and proc limit tuning didnt really seems to have helped. Any other tips are welcome.
I attached a screenshot of the result in NagVis. I can't really get a grip on what could be causing it. Restarted both my XI and gearman worker, but still same issue.. If I can provide any logfile or config file, please let me know.
Re: check_rrdtraf sometimes returns 0 network traffic
Posted: Mon Nov 07, 2016 4:39 pm
by avandemore
Can you attach /usr/local/nagios/var/npcd.log?
Re: check_rrdtraf sometimes returns 0 network traffic
Posted: Mon Nov 07, 2016 4:44 pm
by WillemDH
These are the only logs of today in npcd.log. We never had any issues before today.
Code: Select all
[11-07-2016 16:24:36] NPCD: Caught Termination Signal - Hasta la vista... baby
[11-07-2016 16:25:35] NPCD: npcd Daemon (0.4.14) started with PID=2634
[11-07-2016 16:25:35] NPCD: Please have a look at 'npcd -V' to get license information
[11-07-2016 16:25:35] NPCD: HINT: load_threshold is enabled - ('20.000000')
The terminal signal is from a reboot I did to try resolve the issue I think.
Code: Select all
uptime
22:45:09 up 6:20, 1 user, load average: 1.40, 1.40, 1.34
Re: check_rrdtraf sometimes returns 0 network traffic
Posted: Mon Nov 07, 2016 4:51 pm
by tgriep
What it could be is that the MRTG process that is run from the cron daemon is taking longer than 5 minutes of that the device that is getting polled it timing out.
Login as root to the XI server and run the following command and post the output here. That should tell us if either of those issues are causing the zero bandwidth.
Code: Select all
time LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg
Re: check_rrdtraf sometimes returns 0 network traffic
Posted: Mon Nov 07, 2016 5:12 pm
by WillemDH
Hmm interesting command. I had to change it to
Code: Select all
time LANG=C LC_ALL=C /usr/local/mrtg/bin/mrtg /etc/mrtg/mrtg.cfg
As mrtg on the gearman worker node is installed in /usr/local/mrtg/bin
But it did took a very long time (I acutally aborted it, as it took too long..) THis is the end of the output:
Code: Select all
LANG=C LC_ALL=C /usr/local/mrtg/bin/mrtg /etc/mrtg/mrtg.cfg 93.47s user 5.17s system 61% cpu 2:41.02 total
This could help us clean up some switches who are no longer used etc. As the mrtg run is done with a cron job on the gearman mrtg worker:
Code: Select all
*/1 * * * * root LANG=C LC_ALL=C /usr/local/mrtg/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok
The fact that the mrtg test run which I aborted after 93 seconds is taking longer then 1 minute and that the cronjob is scheduled to run every minute could cause this issue I guess? We set it to 1 as we needed a one minute granularity of data. Could it be an idea to split up the cronjob in two or three?
Re: check_rrdtraf sometimes returns 0 network traffic
Posted: Mon Nov 07, 2016 5:20 pm
by tgriep
You could split up the cron job in to separate ones but you could increase the Forks: setting in the /etc/mrtg/mrtg.cfg file.
That would achieve the same thing, get more mrtg processes running so it finishes quicker.
Re: check_rrdtraf sometimes returns 0 network traffic
Posted: Mon Nov 07, 2016 5:20 pm
by avandemore
What is the contents of /etc/mrtg/mrtg.cfg? I think the default value of Forksis 4. You can try to raise that to some value relatively to the amount of cores on the system. For example, if you have 4 hyperthreaded cores, I would change that value to 10 and restart mrtg.
You make also want to watch npcd threshold as it will stop processing perf data at certain values of system load.
Re: check_rrdtraf sometimes returns 0 network traffic
Posted: Mon Nov 07, 2016 5:25 pm
by WillemDH
Well I forgot to mention this but I already doubled it this afternoon from 4 to 8.
Code: Select all
HtmlDir: /var/www/mrtg
ImageDir: /var/www/mrtg
LogFormat: rrdtool
LogDir: /var/lib/mrtg
ThreshDir: /var/lib/mrtg
WorkDir: /var/lib/mrtg
Include: conf.d/*.cfg
Forks: 8
But I suggest I ask our network team to clean up their old switches. It would be nice to be able to monitor the time that is needed for mrtg to retrieve the info, so we can alert when it's getting higher then the configured cron schedule.
Any idea how I could achieve this in an easy way?