Why is my MRTG job taking so long?

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Why is my MRTG job taking so long?

Post by snapon_admin »

Removed sensitive info.

Code: Select all

[root@lisl-ngos-01-pv conf.d]# time LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok
SNMP Error:
no response received
SNMPv2c_Session (remote host: "<xxx.xxx.xxx.xxx>" [<xxx.xxx.xxx.xxx>].161)
                   community: "<SNMP STRING>"
                  request ID: 1011561030
                 PDU bufsize: 8000 bytes
                     timeout: 2s
                     retries: 5
                     backoff: 1)
 at /usr/bin/../lib/mrtg2/SNMP_util.pm line 497
SNMPGET Problem for ifInOctets.1 ifOutOctets.1 on <SNMP STRING>@<xxx.xxx.xxx.xxx>:::::2:v4only
 at /usr/bin/mrtg line 2330
2015-01-05 10:29:15: WARNING: skipping because at least the query for ifInOctets.1 on  <xxx.xxx.xxx.xxx> did not succeed
2015-01-05 10:29:15: WARNING: no data for ifInOctets&ifOutOctets:<SNMP STRING>@<xxx.xxx.xxx.xxx>. Skipping further queries for Host <xxx.xxx.xxx.xxx> in this round.
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_1][_IN_] ' $target->[2255]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_1][_OUT_] ' $target->[2255]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_2][_IN_] ' $target->[2256]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_2][_OUT_] ' $target->[2256]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_3][_IN_] ' $target->[2257]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_3][_OUT_] ' $target->[2257]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_4][_IN_] ' $target->[2258]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_4][_OUT_] ' $target->[2258]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_5][_IN_] ' $target->[2259]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_5][_OUT_] ' $target->[2259]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_6][_IN_] ' $target->[2260]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_6][_OUT_] ' $target->[2260]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_7][_IN_] ' $target->[2261]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_7][_OUT_] ' $target->[2261]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_8][_IN_] ' $target->[2262]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_8][_OUT_] ' $target->[2262]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_9][_IN_] ' $target->[2263]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_9][_OUT_] ' $target->[2263]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_10][_IN_] ' $target->[2264]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_10][_OUT_] ' $target->[2264]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_12][_IN_] ' $target->[2265]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_12][_OUT_] ' $target->[2265]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_13][_IN_] ' $target->[2266]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_13][_OUT_] ' $target->[2266]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_14][_IN_] ' $target->[2267]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_14][_OUT_] ' $target->[2267]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_15][_IN_] ' $target->[2268]{$mode} ' did not eval into defined data
2015-01-05 10:34:30: ERROR: Target[<xxx.xxx.xxx.xxx>_15][_OUT_] ' $target->[2268]{$mode} ' did not eval into defined data

real    5m20.462s
user    0m9.069s
sys     0m1.869s
[root@lisl-ngos-01-pv conf.d]# 
It's taking a little over 5 minutes for this job to run and there's only one device with errors (it's down atm, hence the errors). I can't for the lif of me figure out why it's taking so long. I've seen this run with 2-3 devices being down and still take less than 3 - 4 minutes. Any ideas on where else I can look? CPU usage/load is good right now (load is at ~6), and nothing else seems to be amiss. One thing I will point out is that the site where this server is located is currently experiencing pretty huge bandwidth utilization, but that's on the outside link. Everything on the LAN there is pretty normal.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Why is my MRTG job taking so long?

Post by slansing »

Not accounting for latency or anything like that, you are looking at 4.6 minutes for that job to run through with all of those timeouts. Timeout is set to 2s for each check, with 5 retries, that would come out to roughly 10 seconds, multiplied by the number of interfaces with "non evals", that comes to 280 seconds, or roughly 4.6 minutes. That is just one cup of coffee math, but I believe that is what you are seeing here, I'm looking into the timeout and retries definitions but as far as I know, that is doing what I noted above.
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Re: Why is my MRTG job taking so long?

Post by snapon_admin »

I'm somewhat confused as to why this would be an issue today, though. This particular device has been down for roughly a month, probably a bit more and we haven't had this issue (at least not this consistently) until today. Also, as I mentioned, we've had situations where there have been 3-4 devices down (what if an entire site goes down, for example) and not had it take this long for the job to run, and all of our network devices are set to 5 retries. I assume the timeout is set by the plugin? If that's the case then that would be the same for all devices as well.
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Re: Why is my MRTG job taking so long?

Post by snapon_admin »

I should also note that I removed the offending cfg file and only saw about a 15 second improvement in the time. I removed the config (after making a backup, of course), ran that command, then restored the config from the backup file and ran the command again. The result, 2M 23S vs 2M 38S:

Code: Select all

[root@lisl-ngos-01-pv conf.d]# time LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok

real    2m23.411s
user    0m7.887s
sys     0m1.238s
[root@lisl-ngos-01-pv conf.d]# 
[root@lisl-ngos-01-pv conf.d]# 
[root@lisl-ngos-01-pv conf.d]# cp <IP ADDRESS>.cfg.bkp <IP ADDRESS>.cfg
[root@lisl-ngos-01-pv conf.d]# rm /var/lock/mrtg/mrtg_l -f
[root@lisl-ngos-01-pv conf.d]# time LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok
SNMP Error:
no response received
SNMPv2c_Session (remote host: "<IP ADDRESS>" [<IP ADDRESS>].161)
                   community: "<SNMP COMMUNITY>"
                  request ID: 144962097
                 PDU bufsize: 8000 bytes
                     timeout: 2s
                     retries: 5
                     backoff: 1)
 at /usr/bin/../lib/mrtg2/SNMP_util.pm line 497
SNMPGET Problem for ifInOctets.1 ifOutOctets.1 on <SNMP COMMUNITY>@<IP ADDRESS>:::::2:v4only
 at /usr/bin/mrtg line 2330
2015-01-05 12:35:34: WARNING: skipping because at least the query for ifInOctets.1 on  <IP ADDRESS> did not succeed
2015-01-05 12:35:34: WARNING: no data for ifInOctets&ifOutOctets:<SNMP COMMUNITY>@<IP ADDRESS>. Skipping further queries for Host <IP ADDRESS> in this round.
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_1][_IN_] ' $target->[2255]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_1][_OUT_] ' $target->[2255]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_2][_IN_] ' $target->[2256]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_2][_OUT_] ' $target->[2256]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_3][_IN_] ' $target->[2257]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_3][_OUT_] ' $target->[2257]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_4][_IN_] ' $target->[2258]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_4][_OUT_] ' $target->[2258]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_5][_IN_] ' $target->[2259]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_5][_OUT_] ' $target->[2259]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_6][_IN_] ' $target->[2260]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_6][_OUT_] ' $target->[2260]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_7][_IN_] ' $target->[2261]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_7][_OUT_] ' $target->[2261]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_8][_IN_] ' $target->[2262]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_8][_OUT_] ' $target->[2262]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_9][_IN_] ' $target->[2263]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_9][_OUT_] ' $target->[2263]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_10][_IN_] ' $target->[2264]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_10][_OUT_] ' $target->[2264]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_12][_IN_] ' $target->[2265]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_12][_OUT_] ' $target->[2265]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_13][_IN_] ' $target->[2266]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_13][_OUT_] ' $target->[2266]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_14][_IN_] ' $target->[2267]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_14][_OUT_] ' $target->[2267]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_15][_IN_] ' $target->[2268]{$mode} ' did not eval into defined data
2015-01-05 12:38:08: ERROR: Target[<IP ADDRESS>_15][_OUT_] ' $target->[2268]{$mode} ' did not eval into defined data

real    2m38.361s
user    0m8.240s
sys     0m1.219s
[root@lisl-ngos-01-pv conf.d]# 
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Why is my MRTG job taking so long?

Post by sreinhardt »

That looks quite a bit more normal, but I honestly can say that aside from network issues or really high load(you seem to keep that down pretty well), there isn't too much that alters mrtg behavior. It looks like this was an over the weekend\this morning issue, correct? Do you know if anything has changed or was having issues throughout the network lately?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Re: Why is my MRTG job taking so long?

Post by snapon_admin »

This was this morning starting from around 8 until about noonish. During that time we had nearly 100% BW usage to our main data center in Illinois. This data center also happens to be where the Nagios server is. I'm thinking that's probably the cause of it, but just wondering if there's anything I can do that would help if this happens again. Not sure if modifying the timeout time or any of that would help or not in this case.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Why is my MRTG job taking so long?

Post by sreinhardt »

I would tend to agree, mrtg is very network heavy and having that much usage would definitely mess with it a bit! Modifying timeout or retries(preferable over timeout) would be a good idea if this is a repeat thing as it would limit the overall delay purely from retries. Although in this case it does seem to be more related to high bandwidth and subsequently high latency and possibly not quite hitting timeout times.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
User avatar
snapon_admin
Posts: 952
Joined: Mon Jun 10, 2013 10:39 am
Location: Kenosha, WI
Contact:

Re: Why is my MRTG job taking so long?

Post by snapon_admin »

Alrighty, I'll dink around with the retry intervals a bit and see how that affects things. Thanks for the insight, go ahead and close this up.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Why is my MRTG job taking so long?

Post by sreinhardt »

Cool, locking it up
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Locked