Page 1 of 2
Possible Bug or Room for Improvement
Posted: Fri Feb 19, 2021 11:35 am
by J.A.K
We do all our host\service adds via the API. Because we often add without a contact group at first we use the force parameter to add the hosts then apply config after. One issue is if you force add a host and a setting is wrong (for example a typo in a selected host group), then applying config will fail, but gives very little to explain why. The show errors will be blank:
Capture.PNG
And verifying the cfg comes back successful as well as the host will appear with the other hosts showing a status of applied. Normally we just sort by ID and delete the last few hosts, but some explanation on the issue during apply config would be very helpful.
Re: Possible Bug or Room for Improvement
Posted: Fri Feb 19, 2021 4:52 pm
by dchurch
When you do an Apply Config from the list, it's really performing three steps:
1. Delete Config
2. Write Config
3. Verify Config
4. Restart Nagios Core (sometimes called the Monitoring Engine)
If you go to Configure (top menu) => Core Config Manager => Tools (left menu) > Config File Management, you can trigger these steps individually and perhaps gain more insight into what went wrong with the configuration.
If you PM me a system profile, I can investigate further as to why the output was blank. It may very well be blank due to a bug we can fix.
Re: Possible Bug or Room for Improvement
Posted: Fri Feb 19, 2021 5:02 pm
by J.A.K
The Config File Management was actually where I noted I was going to verify the cfg file to see if it showed errors. Comes through clean no issues.
Would you like a system profile when it's in a broken state where apply doesn't work? Because I can create a host with a bogus hostgroup real quick if you like it's pretty easy to replicate.
Re: Possible Bug or Room for Improvement
Posted: Mon Feb 22, 2021 10:29 am
by dchurch
J.A.K wrote:Would you like a system profile when it's in a broken state where apply doesn't work?
Yes. A profile would have in it the XI database entries it uses to create the cfg files, as well as logs of what went wrong (among other useful things).
Get one by going to Admin (top menu) => System Profile (in the left menu), then clicking the blue button. If you're unable to generate the the profile through the web interface, please try generating it from the command line by running these commands as root:
Code: Select all
rm -rf /usr/local/nagiosxi/var/components/profile*
/usr/local/nagiosxi/scripts/components/getprofile.sh SUPPORT
Then send me the resulting
/usr/local/nagiosxi/var/components/profile.zip file.
If the profile script fails, please include the ENTIRE output.
J.A.K wrote:The Config File Management was actually where I noted I was going to verify the cfg file to see if it showed errors. Comes through clean no issues.
Weird. It may very well be a bug then, but I won't know until I lab it up using your system profile.
Re: Possible Bug or Room for Improvement
Posted: Mon Feb 22, 2021 10:45 am
by J.A.K
Sent the system profile. Running the cmd does actually show the error in verify that the host has an incorrect template on it, but the "Show Errors" section in apply config just comes back blank when you click it after a failed apply. So just that "Show Errors" appears to be reporting nothing. (Not actually sure how it's supposed to look but I assume it should just have the output from the failed verify?)
Re: Possible Bug or Room for Improvement
Posted: Mon Feb 22, 2021 11:32 am
by dchurch
Looks like it's having problems writing to the checkpoint directory. Every time you apply config, it snapshots the current config before replacing it. My guess is since permissions make it unable to do that, it fails to apply config.
What's the output from the following command?
Code: Select all
ls -la /usr/local/nagiosxi/nom/checkpoints/nagioscore{,/errors}
Everything under those directories is supposed to be owned by
nagios:nagios, so you can go ahead and fix that if it's wrong.
Re: Possible Bug or Room for Improvement
Posted: Mon Feb 22, 2021 11:48 am
by J.A.K
For some reason it looks like errors alone is set to root:root. Let me change that to nagios:nagios and try again.
Is this folder one of the ones that "File Permissions Check" in the GUI should check? Or I guess is there some way to list what folders should be set to what? Checkpoints itself is set to root:root for me but I don't want to recursive chown since I know some folders should be owned by Apache instead of nagios.
Re: Possible Bug or Room for Improvement
Posted: Mon Feb 22, 2021 1:40 pm
by J.A.K
And setting errors from root to nagios fixed it.
MicrosoftTeams-image.png
Recommendation to pass back for the XI team go to black text on red instead of white. lol. I tweaked the CSS above just to show it's much more legible.
Re: Possible Bug or Room for Improvement
Posted: Mon Feb 22, 2021 4:53 pm
by dchurch
J.A.K wrote:For some reason it looks like errors alone is set to root:root. Let me change that to nagios:nagios and try again.
Does applying the configuration work without error now?
J.A.K wrote:Is this folder one of the ones that "File Permissions Check" in the GUI should check?
It may be advantageous to check these permissions periodically and indicate to the user, -- so in response to your question yes, probably it should.
Or at the very least, it should give a better indication as to what went wrong when applying the config.
I can submit a feature request on your behalf if you'd like. Please keep in mind that the decision to implement the enhancement is at the discretion of our development team. I know that the config application step ducks from PHP to shell to C and back, so it might not be easy from a development standpoint to get the error propagated back up to the web interface.
Re: Possible Bug or Room for Improvement
Posted: Mon Feb 22, 2021 9:43 pm
by J.A.K
Yep I mentioned in the previous post but the permissions you pointed out as incorrect fixed it once I set it back, so I believe you've got me settled.
If you could submit that as feedback absolutely. I understand it might just go on the back burner, but I think expanding that permission check to a few more areas would really help prevent situations like this for other users in the future.