Page 1 of 4

MRTG consumes 100% of system resources

Posted: Wed Dec 05, 2018 10:01 am
by TBT
It appears when cron runs MRTG, the process hangs and re-spawns, eventually consuming 100% of system resources. This was introduced after an upgrade to XI 5.5.7 and oddly enough, only affects 1 of our 9 XI servers. Time stamp on the rrd files in /var/lib/mrtg are not updating, we also checked file permissions and ownership on /etc/mrtg and /var/lib/mrtg (mentioned in another thread). No errors present in /var/log/messages.

CentOS 6.10
rrdtool-1.3.8-7.el6.x86_64
glib2-2.28.8-10.el6.x86_64

Any insight?

Re: MRTG consumes 100% of system resources

Posted: Wed Dec 05, 2018 10:24 am
by scottwilkerson
There was a bug introduced in the Switch Wizard and it should be updated
Admin -> Manage Config Wizards -> Check for Updates -> Install updates

Also, running the following commands from the command line will fix a permissions problem that was introduced in this version

Code: Select all

chown apache:nagios /etc/mrtg -R
chmod 775 /etc/mrtg -R
chown apache:nagios /var/lib/mrtg -R
chmod 775 /var/lib/mrtg -R

Re: MRTG consumes 100% of system resources

Posted: Wed Dec 05, 2018 10:29 am
by TBT
scottwilkerson wrote:There was a bug introduced in the Switch Wizard and it should be updated
Admin -> Manage Config Wizards -> Check for Updates -> Install updates

Also, running the following commands from the command line will fix a permissions problem that was introduced in this version

Code: Select all

chown apache:nagios /etc/mrtg -R
chmod 775 /etc/mrtg -R
chown apache:nagios /var/lib/mrtg -R
chmod 775 /var/lib/mrtg -R

As mentioned previously, we've done this.

Edit: Network Switch / Router wizard is already at v2.4.1

Re: MRTG consumes 100% of system resources

Posted: Wed Dec 05, 2018 10:32 am
by scottwilkerson
TBT wrote:As mentioned previously, we've done this.
Sorry, I read too fast.

Can you run the following and see if you get any errors

Code: Select all

 LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lib/mrtg/mrtg.lock --confcache-file /var/lib/mrtg/mrtg.ok --user=nagios --group=nagios

Re: MRTG consumes 100% of system resources

Posted: Wed Dec 05, 2018 10:43 am
by TBT
scottwilkerson wrote:
TBT wrote:As mentioned previously, we've done this.
Sorry, I read too fast.

Can you run the following and see if you get any errors

Code: Select all

 LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lib/mrtg/mrtg.lock --confcache-file /var/lib/mrtg/mrtg.ok --user=nagios --group=nagios
1. That appears to be the same line from within the cron, which we've ran manually as well. It hangs, reproducing the issue.

2. We've also ran with the debug option, resulting in the following:
2018-12-05 10:15:16 -- --fork: Child 0 (31223) waiting to deliver
2018-12-05 10:15:16 -- --fork: Parent reading child 0

3. Also noticed that the /var/lib/mrtg/mrtg.ok files isn't being recreated after we've manually removed it.

Re: MRTG consumes 100% of system resources

Posted: Wed Dec 05, 2018 10:47 am
by scottwilkerson
What are the permissions on this directory?

Code: Select all

ls -ld /var/lib/mrtg

Re: MRTG consumes 100% of system resources

Posted: Wed Dec 05, 2018 10:49 am
by TBT
scottwilkerson wrote:What are the permissions on this directory?

Code: Select all

ls -ld /var/lib/mrtg
drwxrwxr-x. 2 apache nagios 86016 Dec 5 10:45 /var/lib/mrtg

Re: MRTG consumes 100% of system resources

Posted: Wed Dec 05, 2018 11:09 am
by scottwilkerson
This looks correct, and I cannot replicate the issue.

Can you run the mrtg command without the user/group to see if you get the same result, (this is what was changed in 5.5.7, the addition of user/group)

Code: Select all

LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lib/mrtg/mrtg.lock --confcache-file /var/lib/mrtg/mrtg.ok

Re: MRTG consumes 100% of system resources

Posted: Wed Dec 05, 2018 11:50 am
by TBT
Manually running without User and Group was successful. Timestamp on the files (/var/lib/mrtg) now reflects when ran. Also, the mrtg.lock file is present.

Additionally, we've modified the cron job, removing User and Group, allowing it to run as per schedule. Result was also successful as graphs are updating.

We still don't understand why this affects only 1 of the 9 XI Servers in our environment. Should we modify the cron on all servers and will the User/Group be removed from future XI releases?

Re: MRTG consumes 100% of system resources

Posted: Wed Dec 05, 2018 12:12 pm
by scottwilkerson
TBT wrote:Manually running without User and Group was successful. Timestamp on the files (/var/lib/mrtg) now reflects when ran. Also, the mrtg.lock file is present.

Additionally, we've modified the cron job, removing User and Group, allowing it to run as per schedule. Result was also successful as graphs are updating.

We still don't understand why this affects only 1 of the 9 XI Servers in our environment. Should we modify the cron on all servers and will the User/Group be removed from future XI releases?
Glad to hear that removing that resolved the issue, but frankly I don't know why it did. The addition of the user/group to the cron to for a security vulnerability, although upgrading the Wizard to the latest may also mitigate that as well for future runs.

We will not be removing the user/group in the future, if the wizards is updated on all server I would say it is ok to change the cron on all of them.