MRTG consumes 100% of system resources

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
TBT
Posts: 625
Joined: Wed May 18, 2011 1:26 pm

Re: MRTG consumes 100% of system resources

Post by TBT »

scottwilkerson wrote:I've went over and over this and cannot re-create the issue. Until I can do so I would recommend leaving the user/group off of the command in the mrtg cron and removing the write access for the apache user and nagios group to /etc/mrtg/mrtg.cfg as this is another way to remove possible exploitation of the vulnerability.

Code: Select all

chmod ug-w /etc/mrtg/mrtg.cfg
the only other thing I did note that the file in your lib directory is RRDp.pm instead of what is usually RRDs.pm (which is usually found in the default perl path and not necessary to add the AddLib directive to the config)
Can you run the following on both servers

Code: Select all

locate RRDs.pm

Code: Select all

rrdtool -v
Affected Host
$ locate RRDs.pm
/opt/rrdtool-1.4.4/lib/perl/5.10.1/x86_64-linux-thread-multi/RRDs.pm
/usr/lib64/perl5/RRDs.pm.deprecated

$ rrdtool -v
RRDtool 1.4.4


Other Host
$ locate RRDs.pm
/opt/rrdtool-1.4.4/lib/perl/5.10.1/x86_64-linux-thread-multi/RRDs.pm
/usr/lib64/perl5/RRDs.pm.deprecated

$ rrdtool -v
RRDtool 1.4.4

I am stumped too, looking forward to figuring this out.
Nagios XI 2024R2.2.1 (8 Servers)
Nagios Fusion 2024R1.0.2
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: MRTG consumes 100% of system resources

Post by tgriep »

Is the server that is having the issue running rrdcached?

If you rename this file back to .pm

Code: Select all

/usr/lib64/perl5/RRDs.pm.deprecated
Then comment out this line from the /etc/mrtg/mrtg.cfg file is it exists

Code: Select all

LibAdd: /opt/rrdtool-1.4.4/lib/perl/5.10.1
Does the MRTG process still consume the high resources when using the user and group options?
Be sure to check out our Knowledgebase for helpful articles and solutions!
TBT
Posts: 625
Joined: Wed May 18, 2011 1:26 pm

Re: MRTG consumes 100% of system resources

Post by TBT »

tgriep wrote:Is the server that is having the issue running rrdcached?
Yes, RRDCache is in use. But as I've mentioned, the affected XI Host is configured the same as other Hosts which are working correctly.

We initially tested with RRDCache disabled, both with and without the User/Group in the cron, results documented in previous posts. This was how it narrowed down to an issue with User/Group in the cron.

The RRDs.pm.deprecated was invoked by the system and I will assume it has merit. Is it worth our while to try the suggested?
Nagios XI 2024R2.2.1 (8 Servers)
Nagios Fusion 2024R1.0.2
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: MRTG consumes 100% of system resources

Post by tgriep »

Thanks for reporting back your test results. Just trying to eliminate any cause for this issue.

You can try what I suggested if you want to.
Another thing, are the MRTG config files in the /etc/mrtg/conf.d folder all the same on every server?
Be sure to check out our Knowledgebase for helpful articles and solutions!
TBT
Posts: 625
Joined: Wed May 18, 2011 1:26 pm

Re: MRTG consumes 100% of system resources

Post by TBT »

tgriep wrote:Thanks for reporting back your test results. Just trying to eliminate any cause for this issue.

You can try what I suggested if you want to.
Another thing, are the MRTG config files in the /etc/mrtg/conf.d folder all the same on every server?
Files located in /etc/mrtg/conf.d are plentiful as we monitor thousands of hosts. Each XI server is monitoring different network elements so the files within are not the same.

Additionally, the *.cfg files have a mix of permissions and ownership (-rw-r--r-- 1 apache apache or -rwxrwxr-x 1 apache nagios) which I recall us observing and reporting to you sometime back in 2017.

Please advise.
Nagios XI 2024R2.2.1 (8 Servers)
Nagios Fusion 2024R1.0.2
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: MRTG consumes 100% of system resources

Post by tgriep »

The files all should have the same permissions if you ran the commands from the first page of the post.
Run this again to set the permissions again.

Code: Select all

chown apache:nagios /etc/mrtg -R
chmod 775 /etc/mrtg -R
chown apache:nagios /var/lib/mrtg -R
chmod 775 /var/lib/mrtg -R
See if that helps out.

Another thing to try, move the MRTG config files out of that folder and put back half at a time until it fails again, then see if you can narrow it down to one of the config files and if you find it, post it here.
Be sure to check out our Knowledgebase for helpful articles and solutions!
TBT
Posts: 625
Joined: Wed May 18, 2011 1:26 pm

Re: MRTG consumes 100% of system resources

Post by TBT »

tgriep wrote:The files all should have the same permissions if you ran the commands from the first page of the post.
Run this again to set the permissions again.

Code: Select all

chown apache:nagios /etc/mrtg -R
chmod 775 /etc/mrtg -R
chown apache:nagios /var/lib/mrtg -R
chmod 775 /var/lib/mrtg -R
See if that helps out.

Another thing to try, move the MRTG config files out of that folder and put back half at a time until it fails again, then see if you can narrow it down to one of the config files and if you find it, post it here.
1. I did run the permissions as suggested on the affected XI Host. My comment was in regard to a comparison to a server not affected. Should we run the suggested permission/ownership change on all the servers so that they're kept the same?

2. I've tried your suggested and discovered the following. Out of 56 *.cfg files, there are 6 which cause the CPU spike. I currently have said files excluded from the /etc/mrtg/conf.d dir and the cron is running perfectly. I'll private message you the files shortly (as we do not want them public). Hopefully you'll be able to correlate the issue.
Nagios XI 2024R2.2.1 (8 Servers)
Nagios Fusion 2024R1.0.2
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: MRTG consumes 100% of system resources

Post by tgriep »

1. Yes, I would run those commands on all of the Nagios servers to make sure the permissions are set the same.
Especially if you upgrade the Network Switch / Router Wizard to the latest version.

2. Thanks for the files. The only thing I see is this option is set in all of the files.
enablesnmpv3: yes
That option is set in the main mrtg.cfg and is redundant.

I added that option to my MRTG config file that uses SNMPv3 and tested to see if it caused a high load, and it did not so I don't thing that option is causing the issue.
If you like, try removing it and see if it works on that one server.
I would assume that your working configs are using the same SNMPv3 authentication scheme as the bad files so that rules that out.

The only thing is to verify that the devices still exist and that the ports the MRTG process is polling is still active on the device.
If a port is not active anymore, remove it from the config file and see if the issue still happens.
Be sure to check out our Knowledgebase for helpful articles and solutions!
TBT
Posts: 625
Joined: Wed May 18, 2011 1:26 pm

Re: MRTG consumes 100% of system resources

Post by TBT »

tgriep wrote:1. Yes, I would run those commands on all of the Nagios servers to make sure the permissions are set the same.
Especially if you upgrade the Network Switch / Router Wizard to the latest version.

2. Thanks for the files. The only thing I see is this option is set in all of the files.
enablesnmpv3: yes
That option is set in the main mrtg.cfg and is redundant.

I added that option to my MRTG config file that uses SNMPv3 and tested to see if it caused a high load, and it did not so I don't thing that option is causing the issue.
If you like, try removing it and see if it works on that one server.
I would assume that your working configs are using the same SNMPv3 authentication scheme as the bad files so that rules that out.

The only thing is to verify that the devices still exist and that the ports the MRTG process is polling is still active on the device.
If a port is not active anymore, remove it from the config file and see if the issue still happens.
1. Removing "enablesnmpv3: yes" from the .cfg file our issue remains.
2. Verified an interface, removed it, re-added via Network Switch / Router wizard v2.4.1 our issue remains.
3. Another observation, after re-adding the graphed interface the Network Switch / Router wizard v2.4.1 creates .cfg files with -rw-r--r-- 1 apache apache, this conflicts with the advice of -rwxrwxr-x 1 apache nagios.

Any more ideas?
Nagios XI 2024R2.2.1 (8 Servers)
Nagios Fusion 2024R1.0.2
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: MRTG consumes 100% of system resources

Post by tgriep »

No, I am out of ideas.
Maybe those devices are slower to respond or are sending the data differently.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked