check_rrdtraf sometimes returns 0 network traffic

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

check_rrdtraf sometimes returns 0 network traffic

Post by WillemDH »

Hello,

All of a sudden our services using check_rrdtraf are sporadically returning 0 Mbps in and 0 Mbps out. Is there any known issue which could cause such a behaviour? Please note we have offloaded our mrtg checks to a gearman2 worker node. This has been working for 4 years now and never gave any big issue (except for the memory leak in mod gearman 1 whcih we solved by upgrading gearman)

I did some actions trying to resolve this, as I have the feeling mrtg is hitting some limit.

1) Process limit
/etc/security/limits.d/90-nproc.conf
* soft nproc 1024
to
* soft nproc 4096

2) File limit
Before:
ulimit –Sn
1024

/etc/security/limits.conf
Added:
* soft nofile 4096
After:
ulimit -Sn
4096

Things seem better now, but I'm curious if Nagios support has seen this behaviour before.

Grtz

WIllem
Nagios XI 5.8.1
https://outsideit.net
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: check_rrdtraf sometimes returns 0 network traffic

Post by avandemore »

This is a known issue which is fixed in 5.3.2 if you're running RHEL/CentOS 7. Is this your environment?
Previous Nagios employee
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: check_rrdtraf sometimes returns 0 network traffic

Post by WillemDH »

No both my XI and mrtg worker node are on CentOS 6. Just seeing that my file and proc limit tuning didnt really seems to have helped. Any other tips are welcome.

I attached a screenshot of the result in NagVis. I can't really get a grip on what could be causing it. Restarted both my XI and gearman worker, but still same issue.. If I can provide any logfile or config file, please let me know.
You do not have the required permissions to view the files attached to this post.
Last edited by WillemDH on Mon Nov 07, 2016 4:41 pm, edited 1 time in total.
Nagios XI 5.8.1
https://outsideit.net
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: check_rrdtraf sometimes returns 0 network traffic

Post by avandemore »

Can you attach /usr/local/nagios/var/npcd.log?
Previous Nagios employee
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: check_rrdtraf sometimes returns 0 network traffic

Post by WillemDH »

These are the only logs of today in npcd.log. We never had any issues before today.

Code: Select all

[11-07-2016 16:24:36] NPCD: Caught Termination Signal - Hasta la vista... baby
[11-07-2016 16:25:35] NPCD: npcd Daemon (0.4.14) started with PID=2634
[11-07-2016 16:25:35] NPCD: Please have a look at 'npcd -V' to get license information
[11-07-2016 16:25:35] NPCD: HINT: load_threshold is enabled - ('20.000000')
The terminal signal is from a reboot I did to try resolve the issue I think.

Code: Select all

uptime
 22:45:09 up  6:20,  1 user,  load average: 1.40, 1.40, 1.34
Nagios XI 5.8.1
https://outsideit.net
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: check_rrdtraf sometimes returns 0 network traffic

Post by tgriep »

What it could be is that the MRTG process that is run from the cron daemon is taking longer than 5 minutes of that the device that is getting polled it timing out.
Login as root to the XI server and run the following command and post the output here. That should tell us if either of those issues are causing the zero bandwidth.

Code: Select all

time LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg
Be sure to check out our Knowledgebase for helpful articles and solutions!
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: check_rrdtraf sometimes returns 0 network traffic

Post by WillemDH »

Hmm interesting command. I had to change it to

Code: Select all

time LANG=C LC_ALL=C /usr/local/mrtg/bin/mrtg /etc/mrtg/mrtg.cfg
As mrtg on the gearman worker node is installed in /usr/local/mrtg/bin

But it did took a very long time (I acutally aborted it, as it took too long..) THis is the end of the output:

Code: Select all

LANG=C LC_ALL=C /usr/local/mrtg/bin/mrtg /etc/mrtg/mrtg.cfg  93.47s user 5.17s system 61% cpu 2:41.02 total
This could help us clean up some switches who are no longer used etc. As the mrtg run is done with a cron job on the gearman mrtg worker:

Code: Select all

*/1 * * * * root LANG=C LC_ALL=C /usr/local/mrtg/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok
The fact that the mrtg test run which I aborted after 93 seconds is taking longer then 1 minute and that the cronjob is scheduled to run every minute could cause this issue I guess? We set it to 1 as we needed a one minute granularity of data. Could it be an idea to split up the cronjob in two or three?
Nagios XI 5.8.1
https://outsideit.net
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: check_rrdtraf sometimes returns 0 network traffic

Post by tgriep »

You could split up the cron job in to separate ones but you could increase the Forks: setting in the /etc/mrtg/mrtg.cfg file.
That would achieve the same thing, get more mrtg processes running so it finishes quicker.
Be sure to check out our Knowledgebase for helpful articles and solutions!
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: check_rrdtraf sometimes returns 0 network traffic

Post by avandemore »

What is the contents of /etc/mrtg/mrtg.cfg? I think the default value of Forksis 4. You can try to raise that to some value relatively to the amount of cores on the system. For example, if you have 4 hyperthreaded cores, I would change that value to 10 and restart mrtg.

You make also want to watch npcd threshold as it will stop processing perf data at certain values of system load.
Previous Nagios employee
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: check_rrdtraf sometimes returns 0 network traffic

Post by WillemDH »

Well I forgot to mention this but I already doubled it this afternoon from 4 to 8.

Code: Select all

HtmlDir: /var/www/mrtg
ImageDir: /var/www/mrtg
LogFormat: rrdtool
LogDir: /var/lib/mrtg
ThreshDir: /var/lib/mrtg
WorkDir: /var/lib/mrtg

Include: conf.d/*.cfg
Forks: 8
But I suggest I ask our network team to clean up their old switches. It would be nice to be able to monitor the time that is needed for mrtg to retrieve the info, so we can alert when it's getting higher then the configured cron schedule.

Any idea how I could achieve this in an easy way?
Nagios XI 5.8.1
https://outsideit.net
Locked