what does this mean?

Post by **eloyd** » Thu Jun 18, 2015 8:14 am

Am I missing something? Doesn't NLS have alerting built in?

alerts.png

This is a real question, not me being snarky: How hard would it be to trigger an alert if the logstash/elasticsearch checks produced negative results?

This topic is actually a core part of my 2015 Nagios World Conference presentation.....

tmcdonald · Post by **tmcdonald** » Thu Jun 18, 2015 9:10 am

Well, NLS does have alerting built-in but it alerts on log messages, not check results. In a really perverse way you could have a plugin run by cron, and have it log a message on failure then alert on that.

Post by **eloyd** » Thu Jun 18, 2015 9:15 am

See, here's the thing...

Of course I can do it many other ways. Nagios, cron, all sorts of stuff. But I just figure, since NLS has built-in alerting capabilities, why not allow the two system checks to be able to generate alerts just like anything else within NLS can generate alerts? I mean, I'm assuming there's a piece of internal API that says "function sendAlert() {}" that's used by the threshold alerting system, can't the same function be called by the system daemon check system?

tmcdonald · Post by **tmcdonald** » Thu Jun 18, 2015 12:22 pm

I think that would just be a matter of a feature request. The irony is that if you are using LS to check if ES is running (by checking logs and alerting if "ES is not running!" is found in them) then by definition if ES goes down you can't check for this like you can with other things :)

So this would need to be hard-coded and shouldn't be too terribly difficult. Shall I feature request it?

Post by **eloyd** » Thu Jun 18, 2015 12:30 pm

Wait. I'm mobile right now but we may be talking about different things. I'll be in an office in a couple hours and will write more.

Post by **eloyd** » Thu Jun 18, 2015 2:48 pm

Okay, so. Here is what I thought we were talking about.

NLS does some sort of check using sudo to check the output of a "service logstash status" command. This is what it uses to make the red/green light on the dashboard. Why not make it so that if it fails, in addition to making the light green, it can also trigger an alert? Say, a built-in query (which you would have to program for us) called Logstash Failure or something like that. So if that failure condition arises, we could use the built-in alerting capabilities to alert us that it failed.

Does that make sense?

I mean, yes, I could use NRPE to make sure it's running and even restart it if it's not (which, is what we do) but that's not the point.

jolson · Post by **jolson** » Thu Jun 18, 2015 2:55 pm

eloyd,

That definitely makes sense.

The API calls are run to check on the status of the processes:

Code: Select all

http://192.168.x.x/nagioslogserver/index.php/api/system/status?subsystem=elasticsearch
http://192.168.x.x/nagioslogserver/index.php/api/system/status?subsystem=logstash

{"status":"running","pid":"23957","message":"Search engine (elasticsearch) is running."}
{"status":"running","pid":"24026","message":"Log collector (logstash) is running."}

Those API calls will return good results if logstash/elasticsearch is down:

Code: Select all

http://192.168.x.x/nagioslogserver/index.php/api/system/status?subsystem=elasticsearch
http://192.168.x.x/nagioslogserver/index.php/api/system/status?subsystem=logstash

{"status":"stopped","message":"Search engine (elasticsearch) is stopped."}
{"status":"stopped","message":"Log collector (logstash) is stopped."}

The above is secured using an authorization token supplied by the user logged into the system.

Using the above as a reference, I understand what you mean - when a service is detected as down, why can't we send alerts based on that behavior?

The answer is "We likely can, but what would we do, exactly?" I suppose that's the discussion you're trying to open here. What do you think?

Post by **eloyd** » Thu Jun 18, 2015 2:59 pm

Thanks for that info. That makes life a lot easier, actually.

I guess the question is, if NLS is being sold as a standalone product (and I believe it is positioned as such, currently) then it should be able to do something when it detects that it has failed. I mean, if it has detected a failure state, it should be able to do something with that information (and it currently does - it changes the dashboard green/red light).

If NLS is being used by a Nagios suite customer, then it's a non-issue. So my question to Nagios Enterprises is, how much do you want to present NLS as a standalone product, capable of notifying people when it's broken, or even maybe proactively correcting itself when it is?

Matters not to me, but we'll need to know what to tell our potential customers when we sell them NLS in the future.

tmcdonald · Post by **tmcdonald** » Thu Jun 18, 2015 3:03 pm

Wouldn't be hard at all, really. Should definitely be configurable whether it alerts/self-fixes though.

Post by **eloyd** » Thu Jun 18, 2015 3:06 pm

Of course, if it self-fixes, then you need to teach it how many times to try before it gives up, or does it keep trying to self fix forever. Like, maybe it's out of disk space, and that's why it's failing.

Sounds like a job for Nagios!!

Nagios Support Forum

what does this mean?

Re: what does this mean?

Re: what does this mean?

Re: what does this mean?

Re: what does this mean?

Re: what does this mean?

Re: what does this mean?

Re: what does this mean?

Re: what does this mean?

Re: what does this mean?

Re: what does this mean?