MRTG SNMP timeout

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
DFaught
Posts: 62
Joined: Tue Sep 26, 2017 12:50 pm

MRTG SNMP timeout

Post by DFaught »

I am having some issues with certain network switches timing out in response to the MRTG SNMP queries that result from using the Switch-Router wizard. These switches do exist and do eventually respond to SNMP queries, they are just slow at times. Is there a recommended way to increase the SNMP timeout or retry values for particular devices?

Thanks for any help you can provide.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: MRTG SNMP timeout

Post by npolovenko »

Hello, @DFaught.
Take a look at the solution in the following thread and let us know if it works for you:
https://support.nagios.com/forum/viewto ... 16&t=31149

Regards
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
DFaught
Posts: 62
Joined: Tue Sep 26, 2017 12:50 pm

Re: MRTG SNMP timeout

Post by DFaught »

Thank you for the reply. No, that solution will not work. The initial scan of the device in the wizard worked fine and everything is defined properly in the Nagios and MRTG configs. On occasion, when the periodic mrtg task runs through cron, the device times out for the SNMP queries that the mrtg task is doing. Is there a way to increase the mrtg timeout? For a specific host?
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: MRTG SNMP timeout

Post by npolovenko »

@DFaught, Oh, I see. Please take a look at this article:
https://support.nagios.com/kb/article/n ... hs-29.html
Start reading where it says: MRTG Running Longer Than Five Minutes.
You need to increase the number of forks.

Also, when you delete old or unused services in XI that were using MRTG, you only delete them from XI, MRTG will keep on checking them until you delete their entries from /etc/mrtg/conf.d folder. So if you deleted a lot of mrtg checks in the past, I suggest going over the /etc/mrtg/conf.d folder and deleting them. This will speed up your mrtg, you may not even need to increase the forks.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
DFaught
Posts: 62
Joined: Tue Sep 26, 2017 12:50 pm

Re: MRTG SNMP timeout

Post by DFaught »

You are getting closer, but still not quite there. The problem is not with the overall MRTG process taking too long but just with a certain host and its services that MRTG is polling. From what I can find, it looks like I can possibly go into the /etc/mrtg/conf.d directory and manually edit the .cfg file for that particular host and put new timeout values in all of the Target statements for the monitored services. Does that sound like a reasonable thing to do?

References from:
https://support.nagios.com/kb/article.php?id=62
https://oss.oetiker.ch/mrtg/doc/mrtg-reference.en.html
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: MRTG SNMP timeout

Post by npolovenko »

@DFaught,

Code: Select all

The problem is not with the overall MRTG process taking too long but just with a certain host and its services that MRTG is polling. 
MRTG is polling the data and saving it to the RRD file, then Nagios looks into the RRD file and outputs the info in the UI. If MRTG process times out Nagios will still be able to look into the RRD file, in the UI it will show 0(bandwidth, for example). My point is that Nagios almost never shows MRTG timeout message in the UI. Can you clarify if you've seen that message in some log file, or was it actually in the UI? What are the service names?
Yes, you could change the target statement. But you can also change the timeout globally in mrtg.cfg file. That doesn't mean mrtg will spend more time on each check. Only for checks that take more than "n" minutes to execute, mrtg will wait longer before killing the process.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
DFaught
Posts: 62
Joined: Tue Sep 26, 2017 12:50 pm

Re: MRTG SNMP timeout

Post by DFaught »

You are correct that the timeout is not showing in the GUI. The root user is getting emails from the cron process that look like this:

Subject: Cron <root@mlwnag22> LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok



SNMPGET Problem for ifHCInOctets.10620 ifHCOutOctets.10620 on [email protected]:161::::2:v4only: No response from remote host "10.7.196.250" at /usr/bin/../lib/mrtg2/Net_SNMP_util.pm line 594.
Net_SNMP_util::snmpget('[email protected]:161::::2:v4only', 'HASH(0x47892f0)', 'ifHCInOctets.10620', 'ifHCOutOctets.10620') called at /usr/bin/mrtg line 2330
main::getsnmparg('HASH(0x22432e0)', 'HASH(0x478c040)', 'HASH(0x2130678)', 'HASH(0x25e08d0)') called at /usr/bin/mrtg line 2510
main::readtargets('HASH(0x22432e0)', 'ARRAY(0x22cdd48)', 'HASH(0x2130678)') called at /usr/bin/mrtg line 403
main::main called at /usr/bin/mrtg line 143
2018-02-14 11:00:05: WARNING: skipping because at least the query for ifHCInOctets.10620 on 10.7.196.250 did not succeed
2018-02-14 11:00:05: WARNING: no data for ifHCInOctets&ifHCOutOctets:[email protected]. Skipping further queries for Host 10.7.196.250 in this round.
2018-02-14 11:03:09: ERROR: Target[10.7.196.250_10620][_IN_] ' $target->[2370]{$mode} ' did not eval into defined data
2018-02-14 11:03:09: ERROR: Target[10.7.196.250_10620][_OUT_] ' $target->[2370]{$mode} ' did not eval into defined data
2018-02-14 11:03:09: ERROR: Target[10.7.196.250_10621][_IN_] ' $target->[2371]{$mode} ' did not eval into defined data
2018-02-14 11:03:09: ERROR: Target[10.7.196.250_10621][_OUT_] ' $target->[2371]{$mode} ' did not eval into defined data

... (there are more of these last 2 lines for the rest of the interfaces, they all look the same) ...

This error is not consistently for the same host, although there are a few that tend to be the ones that get them, and the hosts are actually up and functional, but the WAN connections to them are sometimes slow.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: MRTG SNMP timeout

Post by npolovenko »

@DFaught, There are two possible ways to go about this:
1. Navigate to /etc/mrtg/conf.d/ folder. Find the config file for the switch that is timing out and open it. On the first line of the config you'll see a similar message:

Code: Select all

# Created by
# /usr/bin/cfgmaker --show-op-down --noreversedns --zero-speed 100000000 [email protected]:161::::2
This means you can recreate the config file you're in by running this command and piping it into the file.

Here's the syntax to include the timeout value:

Code: Select all

community@host:port:timeout:retries:backoff:version
So, you can change the command to include a 20 second timeout:

Code: Select all

/usr/bin/cfgmaker --show-op-down --noreversedns --zero-speed 100000000 [email protected]:161:20:::2
Run the command and swap the old config with the new one. Alternatively, you can go over each line with target and add a 20-second timeout.

Option number 2: Haven't tried that myself but It should probably work. Open the /usr/bin/mrtg file and change the default timeout from 5 second to 10, looks like you'd need to change the line #1998

Either way, please backup files before modifying them, in case something goes wrong.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
DFaught
Posts: 62
Joined: Tue Sep 26, 2017 12:50 pm

Re: MRTG SNMP timeout

Post by DFaught »

Thank you for your help with this. I haven't dug too deeply into it yet, but I think there should be a way to change the timeout globally , your option 2, in the /etc/mrtg/mrtg.cfg file rather than by modifying the code. Should be.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: MRTG SNMP timeout

Post by npolovenko »

@DFaught, Yeah let me know how it goes. I looked at some MRTG manuals but couldn't find a config to define the timeout globally in mrtg.cfg. So probably changing the value in that file from my previous post, or in the Net_SNMP_util.pm module is a way to do it.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked