MRTG consumes 100% of system resources

TBT · Post by **TBT** » Mon Dec 10, 2018 8:18 am

scottwilkerson wrote:I've went over and over this and cannot re-create the issue. Until I can do so I would recommend leaving the user/group off of the command in the mrtg cron and removing the write access for the apache user and nagios group to /etc/mrtg/mrtg.cfg as this is another way to remove possible exploitation of the vulnerability.
Code: Select all
chmod ug-w /etc/mrtg/mrtg.cfg
the only other thing I did note that the file in your lib directory is RRDp.pm instead of what is usually RRDs.pm (which is usually found in the default perl path and not necessary to add the AddLib directive to the config)
Can you run the following on both servers
Code: Select all
locate RRDs.pm
Code: Select all
rrdtool -v

Affected Host
$ locate RRDs.pm
/opt/rrdtool-1.4.4/lib/perl/5.10.1/x86_64-linux-thread-multi/RRDs.pm
/usr/lib64/perl5/RRDs.pm.deprecated

$ rrdtool -v
RRDtool 1.4.4

Other Host
$ locate RRDs.pm
/opt/rrdtool-1.4.4/lib/perl/5.10.1/x86_64-linux-thread-multi/RRDs.pm
/usr/lib64/perl5/RRDs.pm.deprecated

$ rrdtool -v
RRDtool 1.4.4

I am stumped too, looking forward to figuring this out.

Post by **tgriep** » Mon Dec 10, 2018 5:10 pm

Is the server that is having the issue running rrdcached?

If you rename this file back to .pm

Code: Select all

/usr/lib64/perl5/RRDs.pm.deprecated

Then comment out this line from the /etc/mrtg/mrtg.cfg file is it exists

Code: Select all

LibAdd: /opt/rrdtool-1.4.4/lib/perl/5.10.1

Does the MRTG process still consume the high resources when using the user and group options?

TBT · Post by **TBT** » Wed Dec 12, 2018 10:07 am

tgriep wrote:Is the server that is having the issue running rrdcached?

Yes, RRDCache is in use. But as I've mentioned, the affected XI Host is configured the same as other Hosts which are working correctly.

We initially tested with RRDCache disabled, both with and without the User/Group in the cron, results documented in previous posts. This was how it narrowed down to an issue with User/Group in the cron.

The RRDs.pm.deprecated was invoked by the system and I will assume it has merit. Is it worth our while to try the suggested?

Post by **tgriep** » Wed Dec 12, 2018 12:57 pm

Thanks for reporting back your test results. Just trying to eliminate any cause for this issue.

You can try what I suggested if you want to.
Another thing, are the MRTG config files in the /etc/mrtg/conf.d folder all the same on every server?

TBT · Post by **TBT** » Wed Dec 12, 2018 2:31 pm

tgriep wrote:Thanks for reporting back your test results. Just trying to eliminate any cause for this issue.

You can try what I suggested if you want to.
Another thing, are the MRTG config files in the /etc/mrtg/conf.d folder all the same on every server?

Files located in /etc/mrtg/conf.d are plentiful as we monitor thousands of hosts. Each XI server is monitoring different network elements so the files within are not the same.

Additionally, the *.cfg files have a mix of permissions and ownership (-rw-r--r-- 1 apache apache or -rwxrwxr-x 1 apache nagios) which I recall us observing and reporting to you sometime back in 2017.

Please advise.

Post by **tgriep** » Wed Dec 12, 2018 4:11 pm

The files all should have the same permissions if you ran the commands from the first page of the post.
Run this again to set the permissions again.

Code: Select all

chown apache:nagios /etc/mrtg -R
chmod 775 /etc/mrtg -R
chown apache:nagios /var/lib/mrtg -R
chmod 775 /var/lib/mrtg -R

See if that helps out.

Another thing to try, move the MRTG config files out of that folder and put back half at a time until it fails again, then see if you can narrow it down to one of the config files and if you find it, post it here.

TBT · Post by **TBT** » Thu Dec 13, 2018 12:20 pm

tgriep wrote:The files all should have the same permissions if you ran the commands from the first page of the post.
Run this again to set the permissions again.
Code: Select all
chown apache:nagios /etc/mrtg -R
chmod 775 /etc/mrtg -R
chown apache:nagios /var/lib/mrtg -R
chmod 775 /var/lib/mrtg -R
See if that helps out.

Another thing to try, move the MRTG config files out of that folder and put back half at a time until it fails again, then see if you can narrow it down to one of the config files and if you find it, post it here.

1. I did run the permissions as suggested on the affected XI Host. My comment was in regard to a comparison to a server not affected. Should we run the suggested permission/ownership change on all the servers so that they're kept the same?

2. I've tried your suggested and discovered the following. Out of 56 *.cfg files, there are 6 which cause the CPU spike. I currently have said files excluded from the /etc/mrtg/conf.d dir and the cron is running perfectly. I'll private message you the files shortly (as we do not want them public). Hopefully you'll be able to correlate the issue.

Post by **tgriep** » Thu Dec 13, 2018 2:13 pm

1. Yes, I would run those commands on all of the Nagios servers to make sure the permissions are set the same.
Especially if you upgrade the Network Switch / Router Wizard to the latest version.

2. Thanks for the files. The only thing I see is this option is set in all of the files.
enablesnmpv3: yes
That option is set in the main mrtg.cfg and is redundant.

I added that option to my MRTG config file that uses SNMPv3 and tested to see if it caused a high load, and it did not so I don't thing that option is causing the issue.
If you like, try removing it and see if it works on that one server.
I would assume that your working configs are using the same SNMPv3 authentication scheme as the bad files so that rules that out.

The only thing is to verify that the devices still exist and that the ports the MRTG process is polling is still active on the device.
If a port is not active anymore, remove it from the config file and see if the issue still happens.

TBT · Post by **TBT** » Thu Dec 13, 2018 4:30 pm

tgriep wrote:1. Yes, I would run those commands on all of the Nagios servers to make sure the permissions are set the same.
Especially if you upgrade the Network Switch / Router Wizard to the latest version.

2. Thanks for the files. The only thing I see is this option is set in all of the files.
enablesnmpv3: yes
That option is set in the main mrtg.cfg and is redundant.

I added that option to my MRTG config file that uses SNMPv3 and tested to see if it caused a high load, and it did not so I don't thing that option is causing the issue.
If you like, try removing it and see if it works on that one server.
I would assume that your working configs are using the same SNMPv3 authentication scheme as the bad files so that rules that out.

The only thing is to verify that the devices still exist and that the ports the MRTG process is polling is still active on the device.
If a port is not active anymore, remove it from the config file and see if the issue still happens.

1. Removing "enablesnmpv3: yes" from the .cfg file our issue remains.
2. Verified an interface, removed it, re-added via Network Switch / Router wizard v2.4.1 our issue remains.
3. Another observation, after re-adding the graphed interface the Network Switch / Router wizard v2.4.1 creates .cfg files with -rw-r--r-- 1 apache apache, this conflicts with the advice of -rwxrwxr-x 1 apache nagios.

Any more ideas?

Post by **tgriep** » Thu Dec 13, 2018 5:40 pm

No, I am out of ideas.
Maybe those devices are slower to respond or are sending the data differently.

Nagios Support Forum

MRTG consumes 100% of system resources

Re: MRTG consumes 100% of system resources

Re: MRTG consumes 100% of system resources

Re: MRTG consumes 100% of system resources

Re: MRTG consumes 100% of system resources

Re: MRTG consumes 100% of system resources

Re: MRTG consumes 100% of system resources

Re: MRTG consumes 100% of system resources

Re: MRTG consumes 100% of system resources

Re: MRTG consumes 100% of system resources

Re: MRTG consumes 100% of system resources