This topic is actually a core part of my 2015 Nagios World Conference presentation.....
what does this mean?
Re: what does this mean?
Am I missing something? Doesn't NLS have alerting built in?
This is a real question, not me being snarky: How hard would it be to trigger an alert if the logstash/elasticsearch checks produced negative results?
This topic is actually a core part of my 2015 Nagios World Conference presentation.....
This topic is actually a core part of my 2015 Nagios World Conference presentation.....
You do not have the required permissions to view the files attached to this post.
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
Re: what does this mean?
Well, NLS does have alerting built-in but it alerts on log messages, not check results. In a really perverse way you could have a plugin run by cron, and have it log a message on failure then alert on that.
Former Nagios employee
Re: what does this mean?
See, here's the thing...
Of course I can do it many other ways. Nagios, cron, all sorts of stuff. But I just figure, since NLS has built-in alerting capabilities, why not allow the two system checks to be able to generate alerts just like anything else within NLS can generate alerts? I mean, I'm assuming there's a piece of internal API that says "function sendAlert() {}" that's used by the threshold alerting system, can't the same function be called by the system daemon check system?
Of course I can do it many other ways. Nagios, cron, all sorts of stuff. But I just figure, since NLS has built-in alerting capabilities, why not allow the two system checks to be able to generate alerts just like anything else within NLS can generate alerts? I mean, I'm assuming there's a piece of internal API that says "function sendAlert() {}" that's used by the threshold alerting system, can't the same function be called by the system daemon check system?
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
Re: what does this mean?
I think that would just be a matter of a feature request. The irony is that if you are using LS to check if ES is running (by checking logs and alerting if "ES is not running!" is found in them) then by definition if ES goes down you can't check for this like you can with other things :)
So this would need to be hard-coded and shouldn't be too terribly difficult. Shall I feature request it?
So this would need to be hard-coded and shouldn't be too terribly difficult. Shall I feature request it?
Former Nagios employee
Re: what does this mean?
Wait. I'm mobile right now but we may be talking about different things. I'll be in an office in a couple hours and will write more.
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
Re: what does this mean?
Okay, so. Here is what I thought we were talking about.
NLS does some sort of check using sudo to check the output of a "service logstash status" command. This is what it uses to make the red/green light on the dashboard. Why not make it so that if it fails, in addition to making the light green, it can also trigger an alert? Say, a built-in query (which you would have to program for us) called Logstash Failure or something like that. So if that failure condition arises, we could use the built-in alerting capabilities to alert us that it failed.
Does that make sense?
I mean, yes, I could use NRPE to make sure it's running and even restart it if it's not (which, is what we do) but that's not the point.
NLS does some sort of check using sudo to check the output of a "service logstash status" command. This is what it uses to make the red/green light on the dashboard. Why not make it so that if it fails, in addition to making the light green, it can also trigger an alert? Say, a built-in query (which you would have to program for us) called Logstash Failure or something like that. So if that failure condition arises, we could use the built-in alerting capabilities to alert us that it failed.
Does that make sense?
I mean, yes, I could use NRPE to make sure it's running and even restart it if it's not (which, is what we do) but that's not the point.
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
Re: what does this mean?
eloyd,
That definitely makes sense.
The API calls are run to check on the status of the processes:
Using the above as a reference, I understand what you mean - when a service is detected as down, why can't we send alerts based on that behavior?
The answer is "We likely can, but what would we do, exactly?" I suppose that's the discussion you're trying to open here. What do you think?
That definitely makes sense.
The API calls are run to check on the status of the processes:
Code: Select all
http://192.168.x.x/nagioslogserver/index.php/api/system/status?subsystem=elasticsearch
http://192.168.x.x/nagioslogserver/index.php/api/system/status?subsystem=logstashThose API calls will return good results if logstash/elasticsearch is down:{"status":"running","pid":"23957","message":"Search engine (elasticsearch) is running."}
{"status":"running","pid":"24026","message":"Log collector (logstash) is running."}
Code: Select all
http://192.168.x.x/nagioslogserver/index.php/api/system/status?subsystem=elasticsearch
http://192.168.x.x/nagioslogserver/index.php/api/system/status?subsystem=logstashThe above is secured using an authorization token supplied by the user logged into the system.{"status":"stopped","message":"Search engine (elasticsearch) is stopped."}
{"status":"stopped","message":"Log collector (logstash) is stopped."}
Using the above as a reference, I understand what you mean - when a service is detected as down, why can't we send alerts based on that behavior?
The answer is "We likely can, but what would we do, exactly?" I suppose that's the discussion you're trying to open here. What do you think?
Re: what does this mean?
Thanks for that info. That makes life a lot easier, actually.
I guess the question is, if NLS is being sold as a standalone product (and I believe it is positioned as such, currently) then it should be able to do something when it detects that it has failed. I mean, if it has detected a failure state, it should be able to do something with that information (and it currently does - it changes the dashboard green/red light).
If NLS is being used by a Nagios suite customer, then it's a non-issue. So my question to Nagios Enterprises is, how much do you want to present NLS as a standalone product, capable of notifying people when it's broken, or even maybe proactively correcting itself when it is?
Matters not to me, but we'll need to know what to tell our potential customers when we sell them NLS in the future.
I guess the question is, if NLS is being sold as a standalone product (and I believe it is positioned as such, currently) then it should be able to do something when it detects that it has failed. I mean, if it has detected a failure state, it should be able to do something with that information (and it currently does - it changes the dashboard green/red light).
If NLS is being used by a Nagios suite customer, then it's a non-issue. So my question to Nagios Enterprises is, how much do you want to present NLS as a standalone product, capable of notifying people when it's broken, or even maybe proactively correcting itself when it is?
Matters not to me, but we'll need to know what to tell our potential customers when we sell them NLS in the future.
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!
Re: what does this mean?
Wouldn't be hard at all, really. Should definitely be configurable whether it alerts/self-fixes though.
Former Nagios employee
Re: what does this mean?
Of course, if it self-fixes, then you need to teach it how many times to try before it gives up, or does it keep trying to self fix forever. Like, maybe it's out of disk space, and that's why it's failing. 
Sounds like a job for Nagios!!
Sounds like a job for Nagios!!
Eric Loyd • http://everwatch.global • 844.240.EVER • @EricLoyd
I'm a Nagios Fanatic! • Join our public Nagios Discord Server!