Monitoring via Direct TCP/IP (No External Commands/Plugins)

janaka · Post by **janaka** » Tue Nov 04, 2014 2:40 am

I am currently working on a Nagios-based JMX monitoring solution for a cluster of remote Java applications.

Using a command-based approach (like check_jmx: http://exchange.nagios.org/directory/Pl ... mx/details is not very feasible due to the overhead of periodically establishing secure JMX RMI connections with the remote MBean servers each time the command is run.

Passive monitoring (http://nagios.sourceforge.net/docs/3_0/ ... hecks.html) (with the remote applications writing performance results back to the central monitoring server) is also not applicable since the remote application should not share any responsibilities in the monitoring process. (Monitoring is an optional feature of the overall system.)

The ideal solution would be a direct monitoring query from Nagios (not from a periodically run external plugin or command) over TCP/IP, which can be captured, processed and replied to, by a daemon agent running under the remote application. (This will eliminate both the overhead of JMX connection establishment and the overhead of command execution on the monitoring server.)
I have observed a similar TCP/IP-based query interface using the JMX Zabbix Bridge (http://www.kjkoster.org/zapcat/Zapcat_J ... ridge.html) where the direct TCP/IP query is intercepted and processed in the remote application (via a custom agent).

Is such an approach (direct TCP/IP based querying from core modules) possible via Nagios as well?

If so, how can it be implemented? (Is there a definite place, e.g. the query interface feature, that I should be looking at?)

tmcdonald · Post by **tmcdonald** » Tue Nov 04, 2014 3:07 pm

Aside from directly altering the Nagios Core code and recompiling, there is not currently a way to directly connect without using a plugin. The Nagios architecture uses plugins to run checks, but I don't see how that is too far off from what you're trying to ultimately achieve. Regardless of whether Nagios itself runs the check or calls a plugin, the connection still needs to be made so the overhead is a moot point.

Post by **eloyd** » Tue Nov 04, 2014 3:10 pm

Another potentially ideal solution is to have a bean in your Java deployment that can return things of interest via HTTP protocols rather than check_jmx. Not sure where your development effort wants to go, but this is an approach we used at a previous, very large deployment of lots of Java code.

slansing · Post by **slansing** » Tue Nov 04, 2014 5:44 pm

Thanks for the tip eloyd! @janaka let us know your thoughts.

janaka · Post by **janaka** » Wed Nov 05, 2014 1:36 am

@tmcdonald Thanks! The main issue is that the remote JMX connection must be secure (over SSL) and hence the overhead of repeatedly connecting and disconnecting would be substantially higher than when using a TCP/IP (socket) connection... In case I decide to dig into the Nagios Core code, any suggestions regarding where I can start digging?

@eloyd Thanks! In fact, I already have integrated a socket-based agent into the Java application which can accept, process and reply to JMX query strings sent over TCP/IP. What I am looking for right now is a way to connect to the agent directly from Nagios Core, without invoking an external plugin or command. In your case, did you have to use plugins to generate the HTTP requests, or did you find a way to do it directly with Nagios Core?

tmcdonald · Post by **tmcdonald** » Wed Nov 05, 2014 10:29 am

You would want a big shovel because that would be a lot of digging.

Not being a developer I am only somewhat familiar with the Nagios Core codebase, but from working on other projects I know how substantial of a change this would be. Essentially you would need to find whatever portion is responsible for scheduling checks, and add in an exception for the Java services you want to monitor with an always-open TCP stream. You would essentially need to re-write all the plugin logic to work with your new method, as well as handling the communication with the rest of the Nagios system.

I think this would be more work than would be worth it for the benefit of keeping TCP open, but it is an open source project and you're free to poke around. I'll defer to a developer from here on out if you have specific questions, but at this point the modifications being made would be a bit outside of our support realm.

Post by **eloyd** » Wed Nov 05, 2014 10:46 am

@janaka, that is now how Nagios is designed. It is supposed to be firing off a process that does the check and reports the result, not doing the check itself. As Trevor said, that is going to be a lot of digging.

What about this:

Write a daemon that does your checking by maintaining an established connection. Run this thing on your Nagios host. Give IT the ability to respond to HTTP requests which you can then query from Nagios in the usual manner. So in essence, you're monitoring a monitoring process.

abrist · Post by **abrist** » Wed Nov 05, 2014 10:53 am

eloyd wrote: Write a daemon that does your checking by maintaining an established connection. Run this thing on your Nagios host. Give IT the ability to respond to HTTP requests which you can then query from Nagios in the usual manner. So in essence, you're monitoring a monitoring process.

Solid suggestion, though it requires writing a daemon, a jmx client, and a get api. It is a big task to avoid the core worker forks.

Post by **eloyd** » Wed Nov 05, 2014 12:06 pm

It all depends on where you want to spend your development effort. I think writing a standalone daemon that I could query is easier than finding all the touchpoints in Nagios Core that need to be changed to integrate persistent TCP connections. But that's me.

abrist · Post by **abrist** » Wed Nov 05, 2014 4:19 pm

I guess I did not mean to imply that patching core was a better option. I just think it is a lot of work to redesign the current wheel. Now with workers in place in core 4 (and/or using gearman), you can distribute the load to reduce resource usage instead making persistent connections with a custom script/core patch.

Nagios Support Forum

Monitoring via Direct TCP/IP (No External Commands/Plugins)

Monitoring via Direct TCP/IP (No External Commands/Plugins)

Re: Monitoring via Direct TCP/IP (No External Commands/Plugi

Re: Monitoring via Direct TCP/IP (No External Commands/Plugi

Re: Monitoring via Direct TCP/IP (No External Commands/Plugi

Re: Monitoring via Direct TCP/IP (No External Commands/Plugi

Re: Monitoring via Direct TCP/IP (No External Commands/Plugi

Re: Monitoring via Direct TCP/IP (No External Commands/Plugi

Re: Monitoring via Direct TCP/IP (No External Commands/Plugi

Re: Monitoring via Direct TCP/IP (No External Commands/Plugi

Re: Monitoring via Direct TCP/IP (No External Commands/Plugi