Measuring Website Reliability

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
joeynovak
Posts: 2
Joined: Tue May 15, 2018 1:19 pm

Measuring Website Reliability

Post by joeynovak »

Hey Guys,

We recently had an issue where a webserver was denying approx 25% of requests. This was very bad but Nagios didn't alert us, since it requires 4 consecutive failures. Even 2 consequtive failures would mean it would only have a 1 in 16 chance of alerting us.

So, I'd like a plugin that will detect low reliability. I was thinking one that if the website was down (couldn't connect). It would then do a series of checks and see if the website reliability is too low. For example it could repeatedly load a url, say 1k times over a 60second period, and if the reliability is less than 0.1% alert us.

There are lots of "gotchas" to this, the url would need to not load the server much if at all, 1k times over a 60 second period would be ideal, but not likely reachable without forking or multiple threads, etc... But the idea is what I'm looking for.

I searched the forums, and the exchange with no such luck is anyone aware of something that will do this?

Thanks!

Joey
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Measuring Website Reliability

Post by scottwilkerson »

I don't know of such a plugin but would strongly suggest starting by just change the max_check_attempts setting for your service in nagios so it alerts on every failure.

Additionally, I will mention you could probably make a plugin like you desire, HOWEVER, I really want to caution you on what you are proposing.

You are proposing a plugin run 1000 url loads a minute, which will cause a traffic increase to your web server of 1,440,000 page loads per day (this is over and above the current load), for every page you run this plugin on.
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
martinQL
Posts: 10
Joined: Wed Apr 25, 2018 4:22 am

Re: Measuring Website Reliability

Post by martinQL »

Never done this before - but probably you can use check_logfiles to check your webserver logs and count the lines with (real) denied accesses?

https://labs.consol.de/nagios/check_logfiles/
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Measuring Website Reliability

Post by scottwilkerson »

martinQL wrote:Never done this before - but probably you can use check_logfiles to check your webserver logs and count the lines with (real) denied accesses?

https://labs.consol.de/nagios/check_logfiles/
This would work if the server was rejecting and logging, but not if it wasn't reachable, or the service wasn't running
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
joeynovak
Posts: 2
Joined: Tue May 15, 2018 1:19 pm

Re: Measuring Website Reliability

Post by joeynovak »

scottwilkerson wrote:I don't know of such a plugin but would strongly suggest starting by just change the max_check_attempts setting for your service in nagios so it alerts on every failure.

Additionally, I will mention you could probably make a plugin like you desire, HOWEVER, I really want to caution you on what you are proposing.

You are proposing a plugin run 1000 url loads a minute, which will cause a traffic increase to your web server of 1,440,000 page loads per day (this is over and above the current load), for every page you run this plugin on.
I am only proposing doing the 1000 check if it fails, in order to prevent false alarms when a failure occurs.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Measuring Website Reliability

Post by scottwilkerson »

I have no suggestions on how to set this up. sorry
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
Locked