Page 1 of 2

Bandwidth Utilization Data Collection Intermittent

Posted: Fri Mar 08, 2019 6:50 pm
by matt.lilek
Hello Team,

Really not sure how to start with this. Had about half of our links showing 0MB of data before the first reboot in over a year. After the reboot almost all links now show 0. If i go into the performance graph i can see that sometimes it is collecting data but for the most part it is not. I have tried to remove and reconfigure the bandwidth on a few hosts but the issue remains the same. Please let me know what steps need to be done to resolve this issue.

Thank you,

Matt

Re: Bandwidth Utilization Data Collection Intermittent

Posted: Mon Mar 11, 2019 12:38 pm
by tgriep
Can you run this command on the Nagios server and post the output here?

Code: Select all

time LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg
The MRTG process that gathers the Bandwidth data needs to finish within 5 minutes every time it runs and the above will show any errors and how long it takes to run.

If you see the command checking devices that are no longer active, remove the config file for it and that will help to speed up the process.
The config files are in the following folder and should be named by the IP address of the device.
/etc/mrtg/conf.d

Re: Bandwidth Utilization Data Collection Intermittent

Posted: Thu Mar 14, 2019 2:31 pm
by matt.lilek
Hey Tom,

Thanks for the reply, the output was quite long but basically what i put in the two screenshots. Its almost all of them that is like this, need a bulk solution to get them logging again.

Thanks in advance!

Re: Bandwidth Utilization Data Collection Intermittent

Posted: Thu Mar 14, 2019 3:08 pm
by tgriep
One thing I needed is how long the command took to run. The time command at the beginning will tell you that information.
Can you post that information?

FYI, instead of screen capturing the data, most ssh terminals allow you to highlight the data and copy as text, you can do that to save time.

The errors that you did display, are those devices still active on your network?
If not, remove the MRTG configuration files to speed this up.


You can try this.
Edit the following file

Code: Select all

/etc/cron.d/mrtg
Change this line from

Code: Select all

*/5 * * * * root LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lib/mrtg/mrtg.lock --confcache-file /var/lib/mrtg/mrtg.ok --user=nagios --group=nagios
to

Code: Select all

*/5 * * * * root LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lib/mrtg/mrtg.lock --confcache-file /var/lib/mrtg/mrtg.ok
Save the file and restart cron by running

Code: Select all

service crond restart
Let the system run for 15 minutes and see if the bandwidth starts to graph again.

Re: Bandwidth Utilization Data Collection Intermittent

Posted: Tue Apr 02, 2019 6:01 pm
by matt.lilek
Hey Tom,

Sorry was away for a bit on this, here are the times you were looking for
real 9m22.629s
user 0m10.225s
sys 0m0.677s

So changed the line and restarted and it actually killed a bunch of the incoming data from these routers ( the opposite of what we expected) Please let me know what the next step is, Ill be away for a bit but got a couple days now to get this sorted so let me know.

Thanks.

Matt

Re: Bandwidth Utilization Data Collection Intermittent

Posted: Wed Apr 03, 2019 8:28 am
by tgriep
Hello Matt,

What do you mean by " killed a bunch of the incoming data"?

The MRTG process has to finish within 5 minutes each time it runs so it can gather the data it needs to capture so the plugin can calculate the Bandwidth.

You can increase the number of forks that MRTG can spawn which will help speed up the process.
To do that, edit the /etc/mrtg/mrtg.cfg file and change this line from

Code: Select all

Forks: 4
to

Code: Select all

Forks: 20
Save it and that will allow 5 times more forks to gather the data.

And, what I suggested earlier, remove the MRTG config files from the /etc/mrtg/conf.d folder for devices that no longer exist and that will speed up the process as well.

Re: Bandwidth Utilization Data Collection Intermittent

Posted: Wed Apr 03, 2019 9:40 am
by matt.lilek
Hey Tom,

Done that now. When i said it killed the data coming in i meant data stopped coming in for a whole smash of the instances where they were ok just prior to me making that change. There are 100s of configs in there, how can i easily determine and remove any none existing one?

Re: Bandwidth Utilization Data Collection Intermittent

Posted: Wed Apr 03, 2019 9:47 am
by matt.lilek
real 3m28.139s
user 0m11.300s
sys 0m1.354s

are the new times btw

Re: Bandwidth Utilization Data Collection Intermittent

Posted: Wed Apr 03, 2019 3:08 pm
by tgriep
What could of happened is before the change, the MRTG process ran as the nagios user and permissions caused some of the checks from running.
After the change, is runs a root and the configs could be read and the extended time it took to run, caused the new issue.

There is not a quick way to determine which config file can be removed.
They are named by the IP Address of the device.
If you have a list of known devices, you could use that to determine which ones to remove.
Or you can Ping the IP address or do a quick snmpwalk of the device to see if it responds.

Re: Bandwidth Utilization Data Collection Intermittent

Posted: Wed Apr 03, 2019 3:40 pm
by matt.lilek
Hey Tom,

Things are looking much better now after increasing the forks. As for cleanup, maybe i can go through the hundreds that are in there one day (sooner if i have anymore issues) but for now think I am good so thanks for that. you can go ahead and wrap this one up!