Nagios Support Forum

Posted: **Mon May 02, 2016 7:21 am**

Hi guys,

Looking for input on how I might best achieve this, preferably without getting into customising XI too much under the hood (don't want to affect future upgrades etc.) and without creating too much admin overhead when needing to add more checks in future.

As briefly as I can

- We use NagiosXI to monitor ~100 hosts, primarily externally (so HTTP/ping type checks). Most of these hosts are based in Europe at the moment, so the NagiosXI server is in AWS EU West. However we now have a few to monitor in the likes of Australia, Malaysia etc. and are experiencing false-positive host down alerts etc. - I'd guess this is caused by latency issues, lost packets over such a long route as we do not believe users in these local regions actually experience issues accessing the services. So I would like to be able to run these checks closer to where the monitored servers are located, AWS Australia say.

The ideal situation would be to have a Hostgroup called APAC_Hosts and when a host is dropped into that, XI would automatically run all services/host-checks assigned to that host remotely on a CentOS host I'd setup in the AWS APAC region. I've looked at these possibilities so far and have outlined why they're not ideal -

Use check_by_ssh to run the checks on the remote monitoring host - this seems to be the best option, but on it's own I would need setup a copy of all the custom commands and prepend "check_by_ssh -H xxxxxx" etc. to them, then create copies of the services that use these comamnds and apply these services to the APAC host group. Creating the copies isn't really the issue, it could be scripted easily enough, just maintaining them and conveying an understanding of this setup to other teams would be challenging. In essense it violates the reusability ethos and causes all the usual issues duplicating code tends to

.

NPRE - Same duplication of code/commands but in an even less manageable way than check_by_ssh because they're defined on a remote host outside the XI UI.

DNX - Over complicates the solution I think

Fusion/MNTOS - Again puts definition of commands etc. on remote nodes that then need to be managed. Over complicates things again.

Mod_Gearman - This looks very promising, however the fact it cannot feedback performance data is unfortunate for my usecase, I'd guess I'd either not have any performance data for those APAC hosts or the performance check would run from the EU based monitoring host, either of which would not be ideal. It's also geared (:roll:) at being a load-balancing solution so introduces a fairly considerable amount of complexity that we have no need for and it appears would be shown in the XI UI - so same issue of educating other teams on something fairly complex.

Write my own NEB - This would seem to be the best appoach at the moment I think - write a NEB that intercepts all external command calls and if they're in the hostgroup APAC_Hosts, pre-pend the command with one to check_by_ssh to run it on the APAC monitoring host instead. It would basically be a much more basic version of mod_gearman with no load-balancing but I don't I'd lose the performance data. However having not done low-level C programming in many years I'm not sure I could pull this off. I think this would be a really handy NEB for others as well, kinda surprised it doesn't already exist.

Any thoughts, anything I've missed?

Thanks!

Posted: **Mon May 02, 2016 11:12 am**

Mod_Gearman - This looks very promising, however the fact it cannot feedback performance data is unfortunate for my usecase, I'd guess I'd either not have any performance data for those APAC hosts or the performance check would run from the EU based monitoring host, either of which would not be ideal. It's also geared (:roll:) at being a load-balancing solution so introduces a fairly considerable amount of complexity that we have no need for and it appears would be shown in the XI UI - so same issue of educating other teams on something fairly complex.

Write my own NEB - This would seem to be the best appoach at the moment I think - write a NEB that intercepts all external command calls and if they're in the hostgroup APAC_Hosts, pre-pend the command with one to check_by_ssh to run it on the APAC monitoring host instead. It would basically be a much more basic version of mod_gearman with no load-balancing but I don't I'd lose the performance data. However having not done low-level C programming in many years I'm not sure I could pull this off. I think this would be a really handy NEB for others as well, kinda surprised it doesn't already exist.

1) If you use ModGearman, you will have performance data available on XI - I'm not sure where you got the idea that it wouldn't from?
2) ModGearman has some additional modules for distributed perfdata processing, but it has never seemed efficient to me (since it still has to be processed on the XI side) - I've never personally used this in production, and don't see a need.
3) This is coming from someone who used ModGearman to distribute 2,000 host checks and 40,000 service checks.
4) ModGearman doesn't add that much complexity once you become familiar with the documentation (its pretty well documented).
5) Writing a custom NEB is fun and challenging, but I think ModGearman in a very basic configuration would accomplish your needs.

Posted: **Thu May 05, 2016 12:52 pm**

Thanks for the reply bheden, I missed the notication or I would have come back sooner.

It's distinctly possible I misinterpreted the Mod-Gearman documentation

, what I read was "Note: processing of perfdata is not part of mod_gearman. You will need additional worker for handling performance data. For example: PNP4Nagios. Performance data is just written to the gearman queue.". So from what you're saying, the performance data will come back to the XI server and what this note means is that any additional processing of that data will have to be done by XI, not by your worker nodes. This would be totally fine in my case, the XI server is by no means busy.

It would seem Mod-Gearman is probably the way to go for me at this point. Out of interest are there any good examples you know of of custom NEBs out there, I found very little when Googling. I think if the framework of the code was already there to grab the external commands events I could probably scrape together the code to make it do what I need, whereas writing it from scratch would take me quite a considerable amount of time...

Posted: **Thu May 05, 2016 5:07 pm**

I found this and it's pretty extensive, did you see this one already?

Code: Select all

http://nagios.sourceforge.net/download/contrib/documentation/misc/NEB%202x%20Module%20API.pdf

Posted: **Fri May 06, 2016 2:06 am**

That's perfect thanks ssax. I assume the fact that document was last updated in 2006 means writing your own NEBs is not too common

?

I don't suppose you know which call back routine would be mosdt suitable to catch a command before it is executed, or is there even a call back for that? NEBCALLBACK_EXTERNAL_COMMAND_DATA seems the most likely, but it is unclear to me if you get a callback before or after the command is executed, and also whether this call back is called for all commands or not. Any thoughts?

Posted: **Fri May 06, 2016 9:15 am**

First of all, let me apologize for the lack of documentation. There are a few good resources available if you dig deep enough. The ones that stand out are:

1) The source code itself. In nagios-core source, there is a directory called "module." This compiles into a usable barebones NEB (it does nothing but print to the log files). Still good for learning the basic set up.
2) The headers! Any header that starts with neb in the /include/ directory of the nagios-core directory has all of the data structures you need to know about, and all of the definitions to check against.
3) Read the source code of other NEBs (ModGearman, NDO, mk_livestatus, DNX [although defunct now, still has source available])

Also, the callback type(s) your looking for, for catching commands before they are executed: NEBCALLBACK_SERVICE_CHECK_DATA & NEBCALLBACK_HOST_CHECK_DATA.

But only of certain event_types. (NEBTYPE_SERVICECHECK_ASYNC_PRECHECK, NEBTYPE_HOSTCHECK_ASYNC_PRECHECK).

And then you potentially need a callback for NEBCALLBACK_TIMED_EVENT_DATA to inject your results back into Nagios!

I'd like to see this thread closed, and if you need any further assistance with building your module - open a new thread in General > Core > Development, and I'll be happy to offer assistance where available.

Good luck!

Nagios Support Forum

Execute commands on remote host based on host groups

Execute commands on remote host based on host groups

Re: Execute commands on remote host based on host groups

Re: Execute commands on remote host based on host groups

Re: Execute commands on remote host based on host groups

Re: Execute commands on remote host based on host groups

Re: Execute commands on remote host based on host groups