Applying config takes forever when perfdata stored on NetApp

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
krutaw
Posts: 60
Joined: Wed Jul 31, 2013 6:30 pm

Applying config takes forever when perfdata stored on NetApp

Post by krutaw »

My setup is a little bit odd because I have a NetApp that I want to use to expose an NFS mount that is shared between multiple servers to store the perfdata without risking losing that valuable intellectual property in the event of a Nagios server failure. When you do a stock install of Nagios XI and shift the performance data over to an NFS volume on a NetApp, odds are good you'll start seeing exorbitant times when applying configuration.

The reason why this happens is because part of the scripting that executes attempts to change ownership on the nagios controlled directories (which includes the perfdata folder.) By default (at least here), NetApp volumes have SnapMirroring turned on, which exposes a folder named ".snapshot" in each folder in the volume to keep track of changes as they happen. This is great if you end up corrupting a file and need to roll back, but darn annoying if you're having to wait forever when applying configuration.

I found two solutions to this.

I'll start with the better of the two (but not the one I'm using yet.) You can disable the visibility of the SnapMirror folder to clients so that the snapshots are still taken but not visible to the client machine. The steps are pretty simple to follow and can be found here: https://library.netapp.com/ecmdocs/ECMP ... 95F07.html.

Now, for the more hacky way to do it. If your storage administrators either refuse to make that change (this is the boat I'm in, thanks storage guys! :( ) or want to test extensively before rolling to production, you can work around the issue by editing the file (/usr/local/nagiosxi/scripts/export_nagiosql.sh.)

Comment out the following lines:
#sudo $BASEDIR/reset_config_perms.sh

#error handling
#ret=$?
#if [ $ret -gt 0 ]; then
# echo "RESETTING CONFIG PERMS FAILED!\n"
# exit 4
#fi

Now, before the forum gods strike me down for heresy, I would HIGHLY recommend that you also setup some sort of cronjob to keep the permissions in check as it really is important that all of the permissions are setup properly. Another tidbit is that you'll have to do this every time you upgrade NagiosXI since this is a file that is included with the upgrades.

While this is not an optimal solution by any means, it does certainly circumvent the problem at hand.

Hope this helps someone!
User avatar
tacolover101
Posts: 432
Joined: Mon Apr 10, 2017 11:55 am

Re: Applying config takes forever when perfdata stored on Ne

Post by tacolover101 »

i don't have any comment on your code change. from what i'm gathering though, i think this would lead to really poor performance. (or possibly issues down the road with rw for perfdata due to speeds.)

here are a couple ideas to consider that may help to retain that perfdata better:
- rsync data over periodically. (- hacky)
- modify the perfdata command to send to send elsewhere, or split it to send to both RRD, a SIEM, grafana, or ??? ( + clean, - overhead) - https://assets.nagios.com/downloads/nag ... fdata.html
- parse nagios.log in your SIEM and store metrics this way (+no direct nagios impact, -can't think of any, but i'm sure there is a caveat)

regardless, it may be worth implementing a ram disk anywho to speed up processing if you aren't already. https://assets.nagios.com/downloads/nag ... giosXI.pdf
krutaw
Posts: 60
Joined: Wed Jul 31, 2013 6:30 pm

Re: Applying config takes forever when perfdata stored on Ne

Post by krutaw »

tacolover101 wrote:i don't have any comment on your code change. from what i'm gathering though, i think this would lead to really poor performance. (or possibly issues down the road with rw for perfdata due to speeds.)

here are a couple ideas to consider that may help to retain that perfdata better:
- rsync data over periodically. (- hacky)
- modify the perfdata command to send to send elsewhere, or split it to send to both RRD, a SIEM, grafana, or ??? ( + clean, - overhead) - https://assets.nagios.com/downloads/nag ... fdata.html
- parse nagios.log in your SIEM and store metrics this way (+no direct nagios impact, -can't think of any, but i'm sure there is a caveat)

regardless, it may be worth implementing a ram disk anywho to speed up processing if you aren't already. https://assets.nagios.com/downloads/nag ... giosXI.pdf
I'll tell ya how it goes after a few months, but the performance has been stellar thus far. Realistically, separating out the IOps for the database and the perfdata so that they're hitting different disks is the endgoal. My point was simply to save anyone who might attempt the same thing with a NetApp NFS mount point a bit of time. :)

I did find part of your response interesting as Grafana is a long-term plan for us which will include interacting with the RRDs until RRD is no longer a thing in Nagios XI. Also, yes, we have employed a ramdisk and numerous other performance tweaks to make it "just run" regardless of how many services happen to fail at once.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Applying config takes forever when perfdata stored on Ne

Post by tgriep »

Thanks for the information @krutaw and @tacolover101.

Another option to do instead of disabling the whole reset permission script is to just comment out the perfdata folder and leave the rest alone.
That will make sure that the newly created config files are set correctly but bypasss the perfdata files keeping the speed up.
Be sure to check out our Knowledgebase for helpful articles and solutions!
krutaw
Posts: 60
Joined: Wed Jul 31, 2013 6:30 pm

Re: Applying config takes forever when perfdata stored on Ne

Post by krutaw »

tgriep wrote:Thanks for the information @krutaw and @tacolover101.

Another option to do instead of disabling the whole reset permission script is to just comment out the perfdata folder and leave the rest alone.
That will make sure that the newly created config files are set correctly but bypasss the perfdata files keeping the speed up.
Love the idea. Looks like the proper file is:

/usr/local/nagiosxi/scripts/reset_config_perms.sh

And the lines in question are:

# Set perfdata directory permissions (RRDs get 664)
#/bin/chmod -R 775 /usr/local/nagios/share/perfdata/
#/bin/find /usr/local/nagios/share/perfdata/ -type f -exec chmod 664 -- {} +

#/bin/chown -R $nagiosuser.$nagiosgroup /usr/local/nagios/share/perfdata


I just tested this and it indeed worked. Thanks for pointing that out, it was kind of a duh moment there. :)
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Applying config takes forever when perfdata stored on Ne

Post by tgriep »

No problem, glad to help.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked