scottwilkerson wrote:I've went over and over this and cannot re-create the issue. Until I can do so I would recommend leaving the user/group off of the command in the mrtg cron and removing the write access for the apache user and nagios group to /etc/mrtg/mrtg.cfg as this is another way to remove possible exploitation of the vulnerability.
the only other thing I did note that the file in your lib directory is RRDp.pm instead of what is usually RRDs.pm (which is usually found in the default perl path and not necessary to add the AddLib directive to the config)
Can you run the following on both servers
tgriep wrote:Is the server that is having the issue running rrdcached?
Yes, RRDCache is in use. But as I've mentioned, the affected XI Host is configured the same as other Hosts which are working correctly.
We initially tested with RRDCache disabled, both with and without the User/Group in the cron, results documented in previous posts. This was how it narrowed down to an issue with User/Group in the cron.
The RRDs.pm.deprecated was invoked by the system and I will assume it has merit. Is it worth our while to try the suggested?
Nagios XI 2024R2.2.1 (8 Servers)
Nagios Fusion 2024R1.0.2
tgriep wrote:Thanks for reporting back your test results. Just trying to eliminate any cause for this issue.
You can try what I suggested if you want to.
Another thing, are the MRTG config files in the /etc/mrtg/conf.d folder all the same on every server?
Files located in /etc/mrtg/conf.d are plentiful as we monitor thousands of hosts. Each XI server is monitoring different network elements so the files within are not the same.
Additionally, the *.cfg files have a mix of permissions and ownership (-rw-r--r-- 1 apache apache or -rwxrwxr-x 1 apache nagios) which I recall us observing and reporting to you sometime back in 2017.
Please advise.
Nagios XI 2024R2.2.1 (8 Servers)
Nagios Fusion 2024R1.0.2
Another thing to try, move the MRTG config files out of that folder and put back half at a time until it fails again, then see if you can narrow it down to one of the config files and if you find it, post it here.
Be sure to check out our Knowledgebase for helpful articles and solutions!
tgriep wrote:The files all should have the same permissions if you ran the commands from the first page of the post.
Run this again to set the permissions again.
Another thing to try, move the MRTG config files out of that folder and put back half at a time until it fails again, then see if you can narrow it down to one of the config files and if you find it, post it here.
1. I did run the permissions as suggested on the affected XI Host. My comment was in regard to a comparison to a server not affected. Should we run the suggested permission/ownership change on all the servers so that they're kept the same?
2. I've tried your suggested and discovered the following. Out of 56 *.cfg files, there are 6 which cause the CPU spike. I currently have said files excluded from the /etc/mrtg/conf.d dir and the cron is running perfectly. I'll private message you the files shortly (as we do not want them public). Hopefully you'll be able to correlate the issue.
Nagios XI 2024R2.2.1 (8 Servers)
Nagios Fusion 2024R1.0.2
1. Yes, I would run those commands on all of the Nagios servers to make sure the permissions are set the same.
Especially if you upgrade the Network Switch / Router Wizard to the latest version.
2. Thanks for the files. The only thing I see is this option is set in all of the files.
enablesnmpv3: yes
That option is set in the main mrtg.cfg and is redundant.
I added that option to my MRTG config file that uses SNMPv3 and tested to see if it caused a high load, and it did not so I don't thing that option is causing the issue.
If you like, try removing it and see if it works on that one server.
I would assume that your working configs are using the same SNMPv3 authentication scheme as the bad files so that rules that out.
The only thing is to verify that the devices still exist and that the ports the MRTG process is polling is still active on the device.
If a port is not active anymore, remove it from the config file and see if the issue still happens.
Be sure to check out our Knowledgebase for helpful articles and solutions!
tgriep wrote:1. Yes, I would run those commands on all of the Nagios servers to make sure the permissions are set the same.
Especially if you upgrade the Network Switch / Router Wizard to the latest version.
2. Thanks for the files. The only thing I see is this option is set in all of the files.
enablesnmpv3: yes
That option is set in the main mrtg.cfg and is redundant.
I added that option to my MRTG config file that uses SNMPv3 and tested to see if it caused a high load, and it did not so I don't thing that option is causing the issue.
If you like, try removing it and see if it works on that one server.
I would assume that your working configs are using the same SNMPv3 authentication scheme as the bad files so that rules that out.
The only thing is to verify that the devices still exist and that the ports the MRTG process is polling is still active on the device.
If a port is not active anymore, remove it from the config file and see if the issue still happens.
1. Removing "enablesnmpv3: yes" from the .cfg file our issue remains.
2. Verified an interface, removed it, re-added via Network Switch / Router wizard v2.4.1 our issue remains.
3. Another observation, after re-adding the graphed interface the Network Switch / Router wizard v2.4.1 creates .cfg files with -rw-r--r-- 1 apache apache, this conflicts with the advice of -rwxrwxr-x 1 apache nagios.
Any more ideas?
Nagios XI 2024R2.2.1 (8 Servers)
Nagios Fusion 2024R1.0.2