Page 1 of 2
NagiosBPI notifications odd-ness
Posted: Mon May 07, 2012 6:47 pm
by KevinD
We came across a couple gotchas with the NagiosBPI component.
Setup (all centos5 x64, manual install)
1 parent & 2 children using DNX.
When running in a distributed setup (DNX) we found that the check defaulted to a child host, which we should have seen coming.
We first tried putting the nagiosbpi component onto a shared folder, and then soft linking that to the normal install location. We were hoping that this would give us the ability to use the parent as the normal configuration manager for BPI, and the children would simple pick it up... this did not work, and in fact, broke the XI interface as it considered it a compilation error in the component, apparently it does not follow sym links.
We then tried forcing DNX to act like an active check, and ONLY run on the parent, but this does not have any effect, as the only things I can get to run on the parent are dependencies. (i may be missing a solution here)
I'm not sure checks running from the children will really ever work, as when using DNX they do not have any knowledge of the hosts or services, they just report back what the output of the check the parent asked to be run.
Any known way, or ideas on how this can work with no funky implementation?
Code: Select all
SOLUTION: Thanks to the nagios experts.
edit /usr/local/nagios/etc/dnxServer.cfg
find directive localCheckPattern = .*local.*
replace regular expression with the name of the script or parameter that is running.
restart nagios
NOTE:
I found that a regular expression for the name of the host or service was not a match, dnx debug shows that all that is passed through dnx is something similar to :
Posting New Job [1348]: /usr/local/nagios/libexec/check_bpi.php BPI_GROUP.
localCheckPattern = .*check_bpi.* is what worked for me.
Re: NagiosBPI notifications odd-ness
Posted: Tue May 08, 2012 9:25 am
by mguthrie
Maybe I'm not understanding fully, but this doesn't seem like an issue with BPI as much as it does DNX. DNX slaves execute the checks, but then pass the data back up to the parent server, so all status data is viewed by the parent server. I think there may be some confusion here as to what DNX is supposed to do and how it works...
We first tried putting the nagiosbpi component onto a shared folder, and then soft linking that to the normal install location. We were hoping that this would give us the ability to use the parent as the normal configuration manager for BPI, and the children would simple pick it up... this did not work, and in fact, broke the XI interface as it considered it a compilation error in the component, apparently it does not follow sym links.
For a DNX setup, the parent host has the status data for all checks being run, there's no reason to view information on the slave servers. I'm not understanding why this would need to be done.
We then tried forcing DNX to act like an active check, and ONLY run on the parent, but this does not have any effect, as the only things I can get to run on the parent are dependencies. (i may be missing a solution here)
Nagios running checks on slave servers via DNX is still considered an active check. Nagios schedules, then distributes the active checks to the slave servers so it doesn't have to launch so many child processes.
I'm not sure checks running from the children will really ever work, as when using DNX they do not have any knowledge of the hosts or services, they just report back what the output of the check the parent asked to be run.
The slave servers need the check plugins on them to run the checks needed for their assigned active checks. The slaves also need network / firewall access to the hosts/services that they're monitoring.
DNX documentation with XI
http://library.nagios.com/library/produ ... ith-nagios
Lets take BPI out of the equation for now until we get DNX working the way you want. I think at the moment it's just clouding the underlying issue.
I'll have one of our other techs jump in as well as to how how you can trace the log output to find out where the checks are actually being run on.
Re: NagiosBPI notifications odd-ness
Posted: Tue May 08, 2012 10:36 am
by KevinD
Sorry, I should have been more clear in the problem at hand.
So BPI as a component is working without issue.
DNX is working exactly as expected on our system as well.
The problem arises when we try to create a notification against those BPI nodes.
Creating a notification creates a dummy host, with a service "BPI Process:Group Name", which in turn executes check_bpi.php to verify and create a notification if needed.
So when this check is executed, with DNX in the mix, a child is selected to run the check, but that child does not have the BPI config, nor any knowledge of hosts or services, as that is all managed from the parent.
The children always report back "Unknown BPI Group Index" from the check.
If we manually copy out the config, they always come back "CONFIG IS FALSEGroup state is: Ok: 0 Child Problems" since the child does not have any state information, even if the BPI Group is showing warning on the parent.
The api_tool.php in the component (which appears to be all the check runs) looks like it gets host/service information relevant to the BPI Group from the status.dat file, but this file does not exist on the children in a DNX setup.
So it appears that we either need a way to force the check for BPI Notifications to run on the parent, or a way to make sure the component on the children does the XML/status.dat lookup on the parent.
Re: NagiosBPI notifications odd-ness
Posted: Tue May 08, 2012 10:44 am
by mguthrie
OH, ok. I see what you're saying. Sorry, I totally misunderstood the first message.
To my knowledge, if the child servers don't have the check_bpi.php plugin, DNX will be forced to run it from the parent. Try removing that plugin from the slave servers and see if that forces it to run from the parent.
Re: NagiosBPI notifications odd-ness
Posted: Tue May 08, 2012 10:51 am
by KevinD
This was done, but to no avail.
The status just comes back with "[STDERR]Could not open input file: /usr/local/nagios/libexec/check_bpi.php"
Re: NagiosBPI notifications odd-ness
Posted: Tue May 08, 2012 11:05 am
by KevinD
Just to be thorough, I remove the php binary from the command definition, as the php executable is found, but the script is not.
Since the #! is in the top of the check_bpi.php script, its doing the same thing.
Yet, we see similar results
Code: Select all
[STDERR][EC 127]/bin/sh: /usr/local/nagios/libexec/check_bpi.php: No such file or directory
So it does not look like DNX fails to the parent when the script is missing.
Re: NagiosBPI notifications odd-ness
Posted: Tue May 08, 2012 11:55 am
by mguthrie
I'll have to do a little more hunting on this one, I could be losing my mind but I could swear I've seen other times where the checks default to the parent host because it's not on the children. Also, it wouldn't make sense that checks against "localhost" be run from another server. I'll do some snooping and see what I can find.
Re: NagiosBPI notifications odd-ness
Posted: Tue May 08, 2012 2:45 pm
by kdavison
Do we need to open a support ticket or....?
This is the the first time we have needed help and I want to make sure we are "doing the right things".
Thanks,
Kris Davison
Re: NagiosBPI notifications odd-ness
Posted: Tue May 08, 2012 3:20 pm
by scottwilkerson
Kris,
there is a field in the dnsServer.cfg file where you can specify with regular expressions host check that should be executed on the local host
# OPTIONAL: Local service check regular expression.
# This allows you to specify a regular expression which will be used to
# disqualify matching service checks as candidates for remote execution by
# DNX. Use this to make sure your local host checks stay local. There is no
# default value. If this parameter is not specified, then *ALL* Nagios checks
# will be handled by DNX.
localCheckPattern = .*local.*
Once changes are made you will need to restart nagios
Re: NagiosBPI notifications odd-ness
Posted: Tue May 08, 2012 5:54 pm
by KevinD
That did it!
Much appreciated. Expertise as expected!
I had searched the interwebs for such functionality in DNX as I knew it had to exist, but no permutation that I had come up with using my feeble little mind came up with anything for that directive.