check_rrdtraf sometimes returns 0 network traffic
check_rrdtraf sometimes returns 0 network traffic
Hello,
All of a sudden our services using check_rrdtraf are sporadically returning 0 Mbps in and 0 Mbps out. Is there any known issue which could cause such a behaviour? Please note we have offloaded our mrtg checks to a gearman2 worker node. This has been working for 4 years now and never gave any big issue (except for the memory leak in mod gearman 1 whcih we solved by upgrading gearman)
I did some actions trying to resolve this, as I have the feeling mrtg is hitting some limit.
1) Process limit
/etc/security/limits.d/90-nproc.conf
* soft nproc 1024
to
* soft nproc 4096
2) File limit
Before:
ulimit –Sn
1024
/etc/security/limits.conf
Added:
* soft nofile 4096
After:
ulimit -Sn
4096
Things seem better now, but I'm curious if Nagios support has seen this behaviour before.
Grtz
WIllem
All of a sudden our services using check_rrdtraf are sporadically returning 0 Mbps in and 0 Mbps out. Is there any known issue which could cause such a behaviour? Please note we have offloaded our mrtg checks to a gearman2 worker node. This has been working for 4 years now and never gave any big issue (except for the memory leak in mod gearman 1 whcih we solved by upgrading gearman)
I did some actions trying to resolve this, as I have the feeling mrtg is hitting some limit.
1) Process limit
/etc/security/limits.d/90-nproc.conf
* soft nproc 1024
to
* soft nproc 4096
2) File limit
Before:
ulimit –Sn
1024
/etc/security/limits.conf
Added:
* soft nofile 4096
After:
ulimit -Sn
4096
Things seem better now, but I'm curious if Nagios support has seen this behaviour before.
Grtz
WIllem
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: check_rrdtraf sometimes returns 0 network traffic
This is a known issue which is fixed in 5.3.2 if you're running RHEL/CentOS 7. Is this your environment?
Previous Nagios employee
Re: check_rrdtraf sometimes returns 0 network traffic
No both my XI and mrtg worker node are on CentOS 6. Just seeing that my file and proc limit tuning didnt really seems to have helped. Any other tips are welcome.
I attached a screenshot of the result in NagVis. I can't really get a grip on what could be causing it. Restarted both my XI and gearman worker, but still same issue.. If I can provide any logfile or config file, please let me know.
I attached a screenshot of the result in NagVis. I can't really get a grip on what could be causing it. Restarted both my XI and gearman worker, but still same issue.. If I can provide any logfile or config file, please let me know.
You do not have the required permissions to view the files attached to this post.
Last edited by WillemDH on Mon Nov 07, 2016 4:41 pm, edited 1 time in total.
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: check_rrdtraf sometimes returns 0 network traffic
Can you attach /usr/local/nagios/var/npcd.log?
Previous Nagios employee
Re: check_rrdtraf sometimes returns 0 network traffic
These are the only logs of today in npcd.log. We never had any issues before today.
The terminal signal is from a reboot I did to try resolve the issue I think.
Code: Select all
[11-07-2016 16:24:36] NPCD: Caught Termination Signal - Hasta la vista... baby
[11-07-2016 16:25:35] NPCD: npcd Daemon (0.4.14) started with PID=2634
[11-07-2016 16:25:35] NPCD: Please have a look at 'npcd -V' to get license information
[11-07-2016 16:25:35] NPCD: HINT: load_threshold is enabled - ('20.000000')
Code: Select all
uptime
22:45:09 up 6:20, 1 user, load average: 1.40, 1.40, 1.34Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net
Re: check_rrdtraf sometimes returns 0 network traffic
What it could be is that the MRTG process that is run from the cron daemon is taking longer than 5 minutes of that the device that is getting polled it timing out.
Login as root to the XI server and run the following command and post the output here. That should tell us if either of those issues are causing the zero bandwidth.
Login as root to the XI server and run the following command and post the output here. That should tell us if either of those issues are causing the zero bandwidth.
Code: Select all
time LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfgBe sure to check out our Knowledgebase for helpful articles and solutions!
Re: check_rrdtraf sometimes returns 0 network traffic
Hmm interesting command. I had to change it to
As mrtg on the gearman worker node is installed in /usr/local/mrtg/bin
But it did took a very long time (I acutally aborted it, as it took too long..) THis is the end of the output:
This could help us clean up some switches who are no longer used etc. As the mrtg run is done with a cron job on the gearman mrtg worker:
The fact that the mrtg test run which I aborted after 93 seconds is taking longer then 1 minute and that the cronjob is scheduled to run every minute could cause this issue I guess? We set it to 1 as we needed a one minute granularity of data. Could it be an idea to split up the cronjob in two or three?
Code: Select all
time LANG=C LC_ALL=C /usr/local/mrtg/bin/mrtg /etc/mrtg/mrtg.cfgBut it did took a very long time (I acutally aborted it, as it took too long..) THis is the end of the output:
Code: Select all
LANG=C LC_ALL=C /usr/local/mrtg/bin/mrtg /etc/mrtg/mrtg.cfg 93.47s user 5.17s system 61% cpu 2:41.02 totalCode: Select all
*/1 * * * * root LANG=C LC_ALL=C /usr/local/mrtg/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.okNagios XI 5.8.1
https://outsideit.net
https://outsideit.net
Re: check_rrdtraf sometimes returns 0 network traffic
You could split up the cron job in to separate ones but you could increase the Forks: setting in the /etc/mrtg/mrtg.cfg file.
That would achieve the same thing, get more mrtg processes running so it finishes quicker.
That would achieve the same thing, get more mrtg processes running so it finishes quicker.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: check_rrdtraf sometimes returns 0 network traffic
What is the contents of /etc/mrtg/mrtg.cfg? I think the default value of Forksis 4. You can try to raise that to some value relatively to the amount of cores on the system. For example, if you have 4 hyperthreaded cores, I would change that value to 10 and restart mrtg.
You make also want to watch npcd threshold as it will stop processing perf data at certain values of system load.
You make also want to watch npcd threshold as it will stop processing perf data at certain values of system load.
Previous Nagios employee
Re: check_rrdtraf sometimes returns 0 network traffic
Well I forgot to mention this but I already doubled it this afternoon from 4 to 8.
But I suggest I ask our network team to clean up their old switches. It would be nice to be able to monitor the time that is needed for mrtg to retrieve the info, so we can alert when it's getting higher then the configured cron schedule.
Any idea how I could achieve this in an easy way?
Code: Select all
HtmlDir: /var/www/mrtg
ImageDir: /var/www/mrtg
LogFormat: rrdtool
LogDir: /var/lib/mrtg
ThreshDir: /var/lib/mrtg
WorkDir: /var/lib/mrtg
Include: conf.d/*.cfg
Forks: 8
Any idea how I could achieve this in an easy way?
Nagios XI 5.8.1
https://outsideit.net
https://outsideit.net