Page 1 of 1

Measuring Website Reliability

Posted: Tue May 15, 2018 1:24 pm
by joeynovak
Hey Guys,

We recently had an issue where a webserver was denying approx 25% of requests. This was very bad but Nagios didn't alert us, since it requires 4 consecutive failures. Even 2 consequtive failures would mean it would only have a 1 in 16 chance of alerting us.

So, I'd like a plugin that will detect low reliability. I was thinking one that if the website was down (couldn't connect). It would then do a series of checks and see if the website reliability is too low. For example it could repeatedly load a url, say 1k times over a 60second period, and if the reliability is less than 0.1% alert us.

There are lots of "gotchas" to this, the url would need to not load the server much if at all, 1k times over a 60 second period would be ideal, but not likely reachable without forking or multiple threads, etc... But the idea is what I'm looking for.

I searched the forums, and the exchange with no such luck is anyone aware of something that will do this?

Thanks!

Joey

Re: Measuring Website Reliability

Posted: Tue May 15, 2018 2:35 pm
by scottwilkerson
I don't know of such a plugin but would strongly suggest starting by just change the max_check_attempts setting for your service in nagios so it alerts on every failure.

Additionally, I will mention you could probably make a plugin like you desire, HOWEVER, I really want to caution you on what you are proposing.

You are proposing a plugin run 1000 url loads a minute, which will cause a traffic increase to your web server of 1,440,000 page loads per day (this is over and above the current load), for every page you run this plugin on.

Re: Measuring Website Reliability

Posted: Wed May 16, 2018 1:07 am
by martinQL
Never done this before - but probably you can use check_logfiles to check your webserver logs and count the lines with (real) denied accesses?

https://labs.consol.de/nagios/check_logfiles/

Re: Measuring Website Reliability

Posted: Wed May 16, 2018 7:50 am
by scottwilkerson
martinQL wrote:Never done this before - but probably you can use check_logfiles to check your webserver logs and count the lines with (real) denied accesses?

https://labs.consol.de/nagios/check_logfiles/
This would work if the server was rejecting and logging, but not if it wasn't reachable, or the service wasn't running

Re: Measuring Website Reliability

Posted: Thu May 17, 2018 5:31 pm
by joeynovak
scottwilkerson wrote:I don't know of such a plugin but would strongly suggest starting by just change the max_check_attempts setting for your service in nagios so it alerts on every failure.

Additionally, I will mention you could probably make a plugin like you desire, HOWEVER, I really want to caution you on what you are proposing.

You are proposing a plugin run 1000 url loads a minute, which will cause a traffic increase to your web server of 1,440,000 page loads per day (this is over and above the current load), for every page you run this plugin on.
I am only proposing doing the 1000 check if it fails, in order to prevent false alarms when a failure occurs.

Re: Measuring Website Reliability

Posted: Fri May 18, 2018 9:03 am
by scottwilkerson
I have no suggestions on how to set this up. sorry