Page 1 of 4
MRTG consumes 100% of system resources
Posted: Wed Dec 05, 2018 10:01 am
by TBT
It appears when cron runs MRTG, the process hangs and re-spawns, eventually consuming 100% of system resources. This was introduced after an upgrade to XI 5.5.7 and oddly enough, only affects 1 of our 9 XI servers. Time stamp on the rrd files in /var/lib/mrtg are not updating, we also checked file permissions and ownership on /etc/mrtg and /var/lib/mrtg (mentioned in another thread). No errors present in /var/log/messages.
CentOS 6.10
rrdtool-1.3.8-7.el6.x86_64
glib2-2.28.8-10.el6.x86_64
Any insight?
Re: MRTG consumes 100% of system resources
Posted: Wed Dec 05, 2018 10:24 am
by scottwilkerson
There was a bug introduced in the Switch Wizard and it should be updated
Admin -> Manage Config Wizards -> Check for Updates -> Install updates
Also, running the following commands from the command line will fix a permissions problem that was introduced in this version
Code: Select all
chown apache:nagios /etc/mrtg -R
chmod 775 /etc/mrtg -R
chown apache:nagios /var/lib/mrtg -R
chmod 775 /var/lib/mrtg -R
Re: MRTG consumes 100% of system resources
Posted: Wed Dec 05, 2018 10:29 am
by TBT
scottwilkerson wrote:There was a bug introduced in the Switch Wizard and it should be updated
Admin -> Manage Config Wizards -> Check for Updates -> Install updates
Also, running the following commands from the command line will fix a permissions problem that was introduced in this version
Code: Select all
chown apache:nagios /etc/mrtg -R
chmod 775 /etc/mrtg -R
chown apache:nagios /var/lib/mrtg -R
chmod 775 /var/lib/mrtg -R
As mentioned previously, we've done this.
Edit: Network Switch / Router wizard is already at v2.4.1
Re: MRTG consumes 100% of system resources
Posted: Wed Dec 05, 2018 10:32 am
by scottwilkerson
TBT wrote:As mentioned previously, we've done this.
Sorry, I read too fast.
Can you run the following and see if you get any errors
Code: Select all
LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lib/mrtg/mrtg.lock --confcache-file /var/lib/mrtg/mrtg.ok --user=nagios --group=nagios
Re: MRTG consumes 100% of system resources
Posted: Wed Dec 05, 2018 10:43 am
by TBT
scottwilkerson wrote:TBT wrote:As mentioned previously, we've done this.
Sorry, I read too fast.
Can you run the following and see if you get any errors
Code: Select all
LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lib/mrtg/mrtg.lock --confcache-file /var/lib/mrtg/mrtg.ok --user=nagios --group=nagios
1. That appears to be the same line from within the cron, which we've ran manually as well. It hangs, reproducing the issue.
2. We've also ran with the debug option, resulting in the following:
2018-12-05 10:15:16 -- --fork: Child 0 (31223) waiting to deliver
2018-12-05 10:15:16 -- --fork: Parent reading child 0
3. Also noticed that the /var/lib/mrtg/mrtg.ok files isn't being recreated after we've manually removed it.
Re: MRTG consumes 100% of system resources
Posted: Wed Dec 05, 2018 10:47 am
by scottwilkerson
What are the permissions on this directory?
Re: MRTG consumes 100% of system resources
Posted: Wed Dec 05, 2018 10:49 am
by TBT
scottwilkerson wrote:What are the permissions on this directory?
drwxrwxr-x. 2 apache nagios 86016 Dec 5 10:45 /var/lib/mrtg
Re: MRTG consumes 100% of system resources
Posted: Wed Dec 05, 2018 11:09 am
by scottwilkerson
This looks correct, and I cannot replicate the issue.
Can you run the mrtg command without the user/group to see if you get the same result, (this is what was changed in 5.5.7, the addition of user/group)
Code: Select all
LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lib/mrtg/mrtg.lock --confcache-file /var/lib/mrtg/mrtg.ok
Re: MRTG consumes 100% of system resources
Posted: Wed Dec 05, 2018 11:50 am
by TBT
Manually running without User and Group was successful. Timestamp on the files (/var/lib/mrtg) now reflects when ran. Also, the mrtg.lock file is present.
Additionally, we've modified the cron job, removing User and Group, allowing it to run as per schedule. Result was also successful as graphs are updating.
We still don't understand why this affects only 1 of the 9 XI Servers in our environment. Should we modify the cron on all servers and will the User/Group be removed from future XI releases?
Re: MRTG consumes 100% of system resources
Posted: Wed Dec 05, 2018 12:12 pm
by scottwilkerson
TBT wrote:Manually running without User and Group was successful. Timestamp on the files (/var/lib/mrtg) now reflects when ran. Also, the mrtg.lock file is present.
Additionally, we've modified the cron job, removing User and Group, allowing it to run as per schedule. Result was also successful as graphs are updating.
We still don't understand why this affects only 1 of the 9 XI Servers in our environment. Should we modify the cron on all servers and will the User/Group be removed from future XI releases?
Glad to hear that removing that resolved the issue, but frankly I don't know why it did. The addition of the user/group to the cron to for a security vulnerability, although upgrading the Wizard to the latest may also mitigate that as well for future runs.
We will not be removing the user/group in the future, if the wizards is updated on all server I would say it is ok to change the cron on all of them.