Page 1 of 2

Bandwidth charts show 0/0 for us as well

Posted: Thu Feb 07, 2019 1:37 am
by ganderson
Following very closely with this topic:
https://support.nagios.com/forum/viewto ... 16&t=51709

Ping and status graphs are working, however bandwidth graphs do not. The behaviour is inconsistent. Seems significantly related to running MRTG as the nagios user via command line arguments, but the issue can occur as root also, especially if a check is missed.

Do not want to run MRTG as root due to security risks.

General Info:
- SNMPv3 queries
- Currently only running one device for testing
- Running MRTG as root seems to return interface traffic counters
- Running MRTG as nagios user seems to return interface traffic counters
- These do not always get written into the RRD file for reasons I cannot ascertain, despite significant investigations as to why
- Running MRTG as root with the --user=nagios --group=nagios arguments fails to connect to SNMP devices to retrieve data, for reasons I cannot ascertain
- Results in Bandwidth charts showing 0/0 or sometimes data for about 15 minutes before dropping off
- snmpwalk returns requested values without issues. I can see nagios gets them also when run as root. Even when these values are sent to rrd, sometimes RRD just does not store them...dont know why. Gut feel it has something to do with the 64 bit values in SNMPv3. Not sure why it is intermittent though.

Actions performed:

https://support.nagios.com/kb/print-29.html
(Documentation issue - sections of this such as setting permissions on /var/lib/mrtg actually break things in running environments. Please review.)

chown "apache:nagios" /etc/mrtg -R
chmod 775 /etc/mrtg -R
chown "apache:nagios" /var/lib/mrtg -R
chmod 775 /var/lib/mrtg -R

[root@nagiosxi etc]# yum list installed | grep -i rrd
rrdtool.x86_64 1.3.8-10.el6 @cr
rrdtool-perl.x86_64 1.3.8-10.el6 @cr
rrdtool-python.x86_64 1.3.8-10.el6 @cr
[root@nagiosxi etc]# yum list installed | grep -i snmp
net-snmp.x86_64 1:5.5-60.el6 @cr
net-snmp-devel.x86_64 1:5.5-60.el6 @cr
net-snmp-libs.x86_64 1:5.5-60.el6 @cr
net-snmp-perl.x86_64 1:5.5-60.el6 @cr
net-snmp-utils.x86_64 1:5.5-60.el6 @cr
perl-Net-SNMP.noarch 5.2.0-4.el6 @epel
perl-SNMP_Session.noarch 1.12-4.el6 @base
php-snmp.x86_64 5.3.3-49.el6 @cr
snmptt.noarch 1.4-0.9.beta2.el6 @epel
[root@nagiosxi etc]# yum list installed | grep -i mrtg
mrtg-libs.x86_64 2.16.2-9.el6 @base

[root@nagiosxi ~]# cpan -l | grep -i rrd
Unknown option: l
Nothing to install!
[root@nagiosxi ~]# cpan -l | grep -i snmp
Unknown option: l
Nothing to install!

cd /tmp
rm -rf nagiosxi xi*.tar.gz
wget http://assets.nagios.com/downloads/nagi ... est.tar.gz
tar xzf xi-latest.tar.gz
cd /tmp/nagiosxi/subcomponents/mrtg/
tar xzf mrtg*.tar.gz
cd mrtg*
./configure --prefix='/usr'
make all
make install

[root@nagiosxi ~]# LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg -debug=cfg,base,log &> /tmp/mrtg.txt
[root@nagiosxi ~]# LANG=C LC_ALL=C /usr/bin/mrtg &>> /tmp/mrtg.txt
[root@nagiosxi ~]# LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg -debug=cfg,base,log --user=nagios --group=nagios &> /tmp/mrtg.txt
[root@nagiosxi ~]# chown nagios /tmp/mrtg.txt
[root@nagiosxi ~]# su - nagios
[nagios@nagiosxi ~]$ LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg -debug=cfg,base,log &>> /tmp/mrtg.txt

(the command with --user=nagios --group=nagios takes notably longer to complete)

I find the line:

--log: got: ???/???

Very interesting, often when it works, I get values for this in the output?

Also in the second run, you can see when specifying --user=nagios and --group=nagios, we get undef! values for the counters, something is broken with SNMP when using these arguments. However in the last command, running as the nagios user, again the counters are collected. This is very repeatable behaviour.

I have removed all config files and added a new config with only one interface. No improvement.
I have deleted the .rrd files for the only remaining host. No improvement.
I have tried increasing the default port speed to a very large number. No improvement.

I am struggling to know what to do next...

Re: Bandwidth charts show 0/0 for us as well

Posted: Thu Feb 07, 2019 2:13 pm
by tgriep
When using SNMPv3 with MRTG, it loads this file Net_SNMP_util.pm in the application so it can poll the remote device.
If that file and the folders it is in, cannot be accessed by the nagios user account, that could be why it fails when it is polling SNMPv3 devices while using the nagios user account.

Search the drive for that file and see if the permissions are set so the nagios user account can read it.

Re: Bandwidth charts show 0/0 for us as well

Posted: Sun Feb 10, 2019 10:24 pm
by ganderson
Thanks for the update. It does seem to have read access:

Code: Select all

[root@nagiosxi ~]# ls -la /usr/lib64/mrtg2/Net_SNMP_util.pm
-rw-r--r-- 1 root root 65075 May 11  2016 /usr/lib64/mrtg2/Net_SNMP_util.pm
[root@nagiosxi ~]# ls -la /usr/lib/mrtg2/Net_SNMP_util.pm
-rw-r--r-- 1 root root 66466 Feb  7 16:14 /usr/lib/mrtg2/Net_SNMP_util.pm

Re: Bandwidth charts show 0/0 for us as well

Posted: Sun Feb 10, 2019 11:00 pm
by ganderson
I actually also find this weird, because the issue does not occur when run as the nagios user/group natively, only when run as arguments to MRTG

Re: Bandwidth charts show 0/0 for us as well

Posted: Mon Feb 11, 2019 9:37 am
by tgriep
Try running this command again and post the full output here. Hopefully it will show something.

Code: Select all

time LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg -debug=cfg,base,log --user=nagios --group=nagios

Re: Bandwidth charts show 0/0 for us as well

Posted: Mon Feb 11, 2019 9:21 pm
by ganderson
I did already do this in the previous file attachment but without the time data. Here it is again though, this time with the...uh...time.

Keep in mind running MRTG as root or with su to nagios, the command is basically instantly completed.

Re: Bandwidth charts show 0/0 for us as well

Posted: Tue Feb 12, 2019 11:35 am
by tgriep
The 10 minutes it takes to run the MRTG command is causing the issue.
I think it may be a version issue in the Net_SNMP_util.pm scripts.
Open up these 2 files and at the top, search for the version number and which ever of the 2 files are an older version, rename the file to Net_SNMP_util.pm.bak.

Code: Select all

/usr/lib64/mrtg2/Net_SNMP_util.pm
/usr/lib/mrtg2/Net_SNMP_util.pm
Then run the MRTG command with time and see if it runs quicker and actually finishes the polling of the devices.

Re: Bandwidth charts show 0/0 for us as well

Posted: Tue Feb 12, 2019 8:46 pm
by ganderson
It doesn't take 10 minutes, it takes 10 seconds.

The version of the files is indeed different, the oldest being the lib64 version. It is our $VERSION = v1.0.15. Renaming this file made no difference to the running of the command.

For posterity, I set that back and then tried renaming the lib version as well. It was our $VERSION = v1.0.20. Renaming this file broke SNMPv3 functionality, and it tried to fall back to SNMPv1 which obviously failed as well.

Re: Bandwidth charts show 0/0 for us as well

Posted: Wed Feb 13, 2019 11:37 am
by tgriep
Yep, 10 seconds is correct. Need to get my eyes checked. 8)
The Net_SNMP_util.pm file that is $VERSION = v1.0.15, just rename it so it does not get loaded by mistake by MRTG.

I am thinking that there is an incompatible Perl module on the system and it is not allowing SNMPv3 polling to function.
When you ran the cpan -l command, it failed on your system so lets see if we can get it to install a newer version.

Run this command to get in to the cpan shell

Code: Select all

perl -MCPAN -e shell
Then in cpan, run the following

Code: Select all

install CPAN
reload cpan
Exit out of cpan and run the following to get the versions of the SNMP perl modules and post the output.

Code: Select all

cpan -l | grep -i snmp
Also, run the following 2 commands and post the output.

Code: Select all

cpan --help
ls -l /var/lib
Thanks

Re: Bandwidth charts show 0/0 for us as well

Posted: Thu Feb 14, 2019 7:16 pm
by ganderson
Thanks for the help thus far...

Are we ignoring that MRTG can pull the SNMP requests when run both as root, or as the nagios user? It just doesn't work when the nagios user is submitted as an argument to MRTG? I would have thought that it implies the subsequent components are okay...

Initially upon running your request updates were done, but it didn't change anything.

Code: Select all

[root@nagiosxi ~]# perl -MCPAN -e shell
Terminal does not support AddHistory.

To fix enter>  install Term::ReadLine::Perl


cpan shell -- CPAN exploration and modules installation (v2.22)
Enter 'h' for help.

cpan[1]> install CPAN
CPAN: Storable loaded ok (v2.20)
Reading '/root/.cpan/Metadata'
  Database was generated on Thu, 14 Feb 2019 23:17:06 GMT
CPAN is up to date (2.22).

cpan[2]> reload cpan
(CPAN__unchanged__v2.22)(CPAN::Author__unchanged__v5.5002)(CPAN::CacheMgr__unchanged__v5.5002)(CPAN::Complete__unchanged__v5.5001)(CPAN::Debug__unchanged__v5.5001)(CPAN::DeferredCode__unchanged__v5.50)(CPAN::Distribution__unchanged__v2.22)(CPAN::Distroprefs__unchanged__v6.0001)(CPAN::Distrostatus__unchanged__v5.5)(CPAN::Exception::RecursiveDependency.....v5.5001)(CPAN::Exception::yaml_not_installed..v5.5)(CPAN::FTP__unchanged__v5.5011)(CPAN::FTP::netrc__unchanged__v1.01)(CPAN::HandleConfig__unchanged__v5.5008)(CPAN::Index__unchanged__v2.12)(CPAN::InfoObj__unchanged__v5.5)(CPAN::LWP::UserAgent....v1.9601)(CPAN::Module__unchanged__v5.5003)(CPAN::Prompt__unchanged__v5.5)(CPAN::Queue__unchanged__v5.5002)(CPAN::Shell__unchanged__v5.5008)(CPAN::Tarzip__unchanged__v5.5012)(CPAN::Version__unchanged__v5.5003)
11 subroutines redefined

cpan[4]> exit
Terminal does not support GetHistory.
Lockfile removed.
[root@nagiosxi ~]# cpan -l
Unknown option: l
Nothing to install!
[root@nagiosxi ~]# cpan --help
/usr/bin/cpan version [unknown] calling Getopt::Std::getopts (version 1.06 [paranoid]),
running under Perl version 5.10.1.

Usage: cpan [-OPTIONS [-MORE_OPTIONS]] [--] [PROGRAM_ARG1 ...]

The following single-character options are accepted:
        Boolean (without arguments): -h -v -C -A -D -O -L -a -r -c -f -i -m -t

Options may be merged together.  -- stops processing of options.

For more details run
        perldoc -F /usr/bin/cpan
  [Now continuing due to backward compatibility and excessive paranoia.
   See ``perldoc Getopt::Std'' about $Getopt::Std::STANDARD_HELP_VERSION.]
Nothing to install!
[root@nagiosxi ~]# ls -l /var/lib
total 188
drwxr-xr-x. 2 root        root         4096 Jul  4  2018 alternatives
drwx------. 3 root        root         4096 Mar 31  2015 authconfig
drwx------  2 apache      apache       4096 Jun 20  2018 dav
drwxr-xr-x  2 root        root         4096 Jun 20  2018 dbus
drwxr-xr-x. 2 root        root         4096 Jul 13  2018 dhclient
drwxr-xr-x. 2 root        root         4096 Sep 23  2011 games
-rw-r--r--  1 root        root         1534 Feb 15 03:12 logrotate.status
drwxr-xr-x. 2 root        root         4096 Sep 23  2011 misc
drwxr-x---  2 root        slocate      4096 Feb 15 03:12 mlocate
drwxrwxr-x  4 apache      nagios      90112 Feb  5 14:06 mrtg
drwxr-xr-x  6 mysql       mysql        4096 Feb  1 11:37 mysql
drwxr-xr-x  3 nagios      nagios       4096 Feb  1 11:36 net-snmp
drwxr-xr-x  2 ntp         ntp          4096 Feb 15 09:37 ntp
drwx------  4 postgres    postgres     4096 Nov 27  2017 pgsql
drwxr-xr-x  3 root        root         4096 Mar 22  2017 php
drwxr-xr-x. 2 root        root         4096 Mar 22  2017 plymouth
drwx------  3 root        root         4096 Mar 17  2015 polkit-1
drwx------. 2 postfix     root         4096 Mar 24  2017 postfix
-rw-------. 1 root        root         4096 Feb  1 11:37 random-seed
drwxr-xr-x. 2 root        root         4096 Feb  1 11:40 rpm
drwx------. 2 root        root         4096 Jun 20  2018 rsyslog
drwxr-x---  2 shellinabox shellinabox  4096 Jul 10  2018 shellinabox
drwxr-xr-x. 4 root        root         4096 Jun 20  2018 stateless
drwxr-xr-x. 3 root        root         4096 Sep  7  2016 udev
drwxr-xr-x. 6 root        root         4096 Aug  3  2018 yum
[root@nagiosxi ~]#