Bandwidth Utilization Data Collection Intermittent

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
matt.lilek
Posts: 137
Joined: Wed Aug 07, 2013 11:53 am

Bandwidth Utilization Data Collection Intermittent

Post by matt.lilek »

Hello Team,

Really not sure how to start with this. Had about half of our links showing 0MB of data before the first reboot in over a year. After the reboot almost all links now show 0. If i go into the performance graph i can see that sometimes it is collecting data but for the most part it is not. I have tried to remove and reconfigure the bandwidth on a few hosts but the issue remains the same. Please let me know what steps need to be done to resolve this issue.

Thank you,

Matt
You do not have the required permissions to view the files attached to this post.
Last edited by tgriep on Thu Apr 04, 2019 8:32 am, edited 1 time in total.
Reason: Profile removed and shared with the other Techs
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Bandwidth Utilization Data Collection Intermittent

Post by tgriep »

Can you run this command on the Nagios server and post the output here?

Code: Select all

time LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg
The MRTG process that gathers the Bandwidth data needs to finish within 5 minutes every time it runs and the above will show any errors and how long it takes to run.

If you see the command checking devices that are no longer active, remove the config file for it and that will help to speed up the process.
The config files are in the following folder and should be named by the IP address of the device.
/etc/mrtg/conf.d
Be sure to check out our Knowledgebase for helpful articles and solutions!
matt.lilek
Posts: 137
Joined: Wed Aug 07, 2013 11:53 am

Re: Bandwidth Utilization Data Collection Intermittent

Post by matt.lilek »

Hey Tom,

Thanks for the reply, the output was quite long but basically what i put in the two screenshots. Its almost all of them that is like this, need a bulk solution to get them logging again.

Thanks in advance!
You do not have the required permissions to view the files attached to this post.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Bandwidth Utilization Data Collection Intermittent

Post by tgriep »

One thing I needed is how long the command took to run. The time command at the beginning will tell you that information.
Can you post that information?

FYI, instead of screen capturing the data, most ssh terminals allow you to highlight the data and copy as text, you can do that to save time.

The errors that you did display, are those devices still active on your network?
If not, remove the MRTG configuration files to speed this up.


You can try this.
Edit the following file

Code: Select all

/etc/cron.d/mrtg
Change this line from

Code: Select all

*/5 * * * * root LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lib/mrtg/mrtg.lock --confcache-file /var/lib/mrtg/mrtg.ok --user=nagios --group=nagios
to

Code: Select all

*/5 * * * * root LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lib/mrtg/mrtg.lock --confcache-file /var/lib/mrtg/mrtg.ok
Save the file and restart cron by running

Code: Select all

service crond restart
Let the system run for 15 minutes and see if the bandwidth starts to graph again.
Be sure to check out our Knowledgebase for helpful articles and solutions!
matt.lilek
Posts: 137
Joined: Wed Aug 07, 2013 11:53 am

Re: Bandwidth Utilization Data Collection Intermittent

Post by matt.lilek »

Hey Tom,

Sorry was away for a bit on this, here are the times you were looking for
real 9m22.629s
user 0m10.225s
sys 0m0.677s

So changed the line and restarted and it actually killed a bunch of the incoming data from these routers ( the opposite of what we expected) Please let me know what the next step is, Ill be away for a bit but got a couple days now to get this sorted so let me know.

Thanks.

Matt
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Bandwidth Utilization Data Collection Intermittent

Post by tgriep »

Hello Matt,

What do you mean by " killed a bunch of the incoming data"?

The MRTG process has to finish within 5 minutes each time it runs so it can gather the data it needs to capture so the plugin can calculate the Bandwidth.

You can increase the number of forks that MRTG can spawn which will help speed up the process.
To do that, edit the /etc/mrtg/mrtg.cfg file and change this line from

Code: Select all

Forks: 4
to

Code: Select all

Forks: 20
Save it and that will allow 5 times more forks to gather the data.

And, what I suggested earlier, remove the MRTG config files from the /etc/mrtg/conf.d folder for devices that no longer exist and that will speed up the process as well.
Be sure to check out our Knowledgebase for helpful articles and solutions!
matt.lilek
Posts: 137
Joined: Wed Aug 07, 2013 11:53 am

Re: Bandwidth Utilization Data Collection Intermittent

Post by matt.lilek »

Hey Tom,

Done that now. When i said it killed the data coming in i meant data stopped coming in for a whole smash of the instances where they were ok just prior to me making that change. There are 100s of configs in there, how can i easily determine and remove any none existing one?
matt.lilek
Posts: 137
Joined: Wed Aug 07, 2013 11:53 am

Re: Bandwidth Utilization Data Collection Intermittent

Post by matt.lilek »

real 3m28.139s
user 0m11.300s
sys 0m1.354s

are the new times btw
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Bandwidth Utilization Data Collection Intermittent

Post by tgriep »

What could of happened is before the change, the MRTG process ran as the nagios user and permissions caused some of the checks from running.
After the change, is runs a root and the configs could be read and the extended time it took to run, caused the new issue.

There is not a quick way to determine which config file can be removed.
They are named by the IP Address of the device.
If you have a list of known devices, you could use that to determine which ones to remove.
Or you can Ping the IP address or do a quick snmpwalk of the device to see if it responds.
Be sure to check out our Knowledgebase for helpful articles and solutions!
matt.lilek
Posts: 137
Joined: Wed Aug 07, 2013 11:53 am

Re: Bandwidth Utilization Data Collection Intermittent

Post by matt.lilek »

Hey Tom,

Things are looking much better now after increasing the forks. As for cleanup, maybe i can go through the hundreds that are in there one day (sooner if i have anymore issues) but for now think I am good so thanks for that. you can go ahead and wrap this one up!
Locked