Page 1 of 1

Run Check Command button not working with hostvars

Posted: Fri Apr 16, 2021 4:20 pm
by mlabbepg
Hi,

I'm new to the forums, but I've been using nagios-core and nagrestconf for a couple years already. We're considering migrating to Nagios XI and are evaluating the product.

Couple questions (I'm not sure if I should've opened a different thread for each).

First:
I'm not sure what I'm missing about hyperv (we're all vmware and barely ever used hyperv), but the check_xi_hyperv default command seems like it should've been named "check_ncpa" (I don't see anything particular about hyperv in there).

Code: Select all

check_xi_hyperv	$USER1$/check_ncpa.py -H $HOSTADDRESS$ -t $_HOSTNCPA_TOKEN$ -P $_HOSTNCPA_PORT$ -M $ARG1$ -w $ARG2$ -c $ARG3$
Second:
Is there a reason why the NCPA Configuration wizard (as per check_xi_ncpa command syntax) applies the NCPA token and ports directly inside $ARG1$ for each service? That doesn't seem practical on the long term. If you ever need to change a NCPA token, it makes more sense to change it in 1 place (free variables _NCPA_TOKEN on the host).

Third:
Because of #2, I'm trying to configure $_HOSTNCPA_TOKEN$ directly as free variable on the hosts (as per the check_xi_hyperv syntax).

The service check works when checking using the Service Status Detail page (so my command syntax and variables are properly configured).
However, when editing the service in CCM and trying the Run Check Command button, I get no result unless I replace the $_HOSTNCPA_TOKEN$ and $_HOSTNCPA_PORT$ directly in the command.

e.g.
$USER1$/check_ncpa.py -H $HOSTADDRESS$ -t supersecrettoken -P 5693 $ARG1$ $ARG2$ $ARG3$ $ARG4$
[*]works both under the Run Check Command button and the Service Status pages.

$USER1$/check_ncpa.py -H $HOSTADDRESS$ -t $_HOSTNCPA_TOKEN$ -P $_HOSTNCPA_PORT$ $ARG1$ $ARG2$ $ARG3$ $ARG4$
[*]works on the Service Status pages, but Run Check Command button seems to loop some code until timing out and returns no result.

Is it just me or is there an issue with the Run Check Command button when using host variables?

Re: Run Check Command button not working with hostvars

Posted: Fri Apr 16, 2021 4:44 pm
by dchurch
The tricky thing about the Run Check Command dialog is that there isn't necessarily any host assigned to a command. There could also be more than one host assigned. While using host free variables in a host check command is supported and works while running the check "for real," the feature that lets you run it from the CCM necessarily makes some assumptions about the host that it can't make from the Service Management page.

For instance, if you have a service attached to a Host Group, the Run Check Command forces you to input the host address because it cannot make the assumption about which host to send the check (could be any number of hosts in the Host Group). Furthermore, if you at that point input an IP address not in your database, with no host variables -- what should it fill in for the host variables? It doesn't have enough information to fill in host variables data just based on the host address.

It's a shortcoming of the Run Check Command dialog that's necessary because of how Nagios defines its checks, and one I don't see being patched in the future. One way to get it work may be to select a host from a dropdown instead of inputting the IP/FQDN (not currently supported). Unless that's implemented, I'd just work around it by running the check command from the command line.
mlabbepg wrote:I'm not sure what I'm missing about hyperv (we're all vmware and barely ever used hyperv), but the check_xi_hyperv default command seems like it should've been named "check_ncpa" (I don't see anything particular about hyperv in there).
The wizard is described as Monitor your Hyper-V server via NCPA and it uses NCPA to run the actual metrics, so you're not wrong there. I assume this was added to the list of wizards to provide an obvious solution to people searching for Hyper-V monitoring.
mlabbepg wrote:Is there a reason why the NCPA Configuration wizard (as per check_xi_ncpa command syntax) applies the NCPA token and ports directly inside $ARG1$ for each service? That doesn't seem practical on the long term. If you ever need to change a NCPA token, it makes more sense to change it in 1 place (free variables _NCPA_TOKEN on the host).
The reason for this is that the Wizards are meant to be user-friendly places to set up monitoring. What they end up becoming for people who are well-versed in how to use Nagios XI's "define-once, apply-many" functions such as Host Groups and template inheritance, etc. -- for people such as yourself the wizards become sort of "training wheels" to aid new users in setting up monitoring.

They're also meant to add configuration that is easy to understand as well as "complete" in that it doesn't depend on a lot from the service templates. Being able to inspect your service in the CCM and see the exact check command instead of seeing a blank in the check command field because it instead comes from a service template, is helpful to new users. Filling this out is also part of the "training wheels" user-friendly nature of the wizards because it makes it easier to understand and see where everything came from.

Re: Run Check Command button not working with hostvars

Posted: Mon Apr 19, 2021 11:26 am
by mlabbepg
dchurch wrote:Using host free variables in a host check command is supported and works while actually running the check.
Indeed, it's working now. I'm not sure what I was doing wrong was las week (maybe using IP at this point, while the host was configured as FQDN so the IP I was using to check against wasn't found in the database).

On a side note, a simple error message like "$_HOSTNCPA_TOKEN$ not defined (or returned more than 1 result) in specified host" would help alot in these cases.
dchurch wrote: The wizard is described as Monitor your Hyper-V server via NCPA and it uses NCPA to run the actual metrics, so you're not wrong there. I assume this was added to the list of wizards to provide an obvious solution to people searching for Hyper-V monitoring.
Couldn't the Monitor your Hyper-V server via NCPA wizard just use the generic check_ncpa command under the hood? The name of the wizard totally makes sense, but the name of the command itself doesn't reflect what it does, which doesn't feel right. Just my two cents.

I've cloned check_xi_hyperv as check_ncpa which I'm using (with host free variables) from now on.
I'm using "ncpa_all", "ncpa_win", "ncpa_lin" service config names, using host free vars and applying to host groups.

So far it's working fine but I'm not quite sure what's the easiest way to create new hosts as I want them.

nagrestconf has a quick "copy host" function that duplicates an existing host with all its services to a new hostname in a couple clicks. Same for service(s) you want to copy to a new host. I'm trying to figure out the easiest way to do the same in Nagios XI.

I ike the wizard approach which provides autodetection on drives, service names, etc. But we're not gaining time if we need to change every single service definition manually after running the wizard. Creating a custom Nagios XI wizard might be our best option for long-term, but at first glance it seems alot of efforts to change a couple small things in existing wizards (I'll check that part of the documentation more closely later).

Bulk cloning tool works but requires an Enterprise licence which seems overkill (and expensive, if we don't really need the other extra features) for a basic "copy host and its services" function like this.

Copying an existing host doesn't copy its services so it's not helping much, besides already filling host free vars with default values, but we can also use host templates for this.

Selecting all services from an existing host and then COPY require to edit all copies one by one afterward and alot of manual changes (change the config name, remove the _copy in the service name, hit change host button and reassign to new host, and repeat for every other service). It would be quicker to duplicate the config file via ssh like we're used to in nagios-core (I'm also not sure if these gets overwritten on every Nagios XI "apply changes" like NAGRESTCONF does).

Any advice would be welcome.

Re: Run Check Command button not working with hostvars

Posted: Mon Apr 19, 2021 12:35 pm
by dchurch
If you have a service check that's different only in where it needs to go, you can apply that check to a slew of hosts using a Host Group. That was the "define-once, apply-many" functionality of Nagios XI I mentioned earlier. Check out the following video for an explanation of host groups.

Convert an existing service to a shared service by disassociating it from the host, adding it to the Host Group, then change the Config Name to something descriptive like "Shared Checks."

Once you have it set up that way, adding a new host is easy. 1. Open Configure => Core Config Manager => Hosts, 2. Add a host, 3. Fill out Host Name, Address, Template: `xiwizard_ncpa_host`, Host Group: [your host group], 4. Click Save.