Hey Guys,
We recently had an issue where a webserver was denying approx 25% of requests. This was very bad but Nagios didn't alert us, since it requires 4 consecutive failures. Even 2 consequtive failures would mean it would only have a 1 in 16 chance of alerting us.
So, I'd like a plugin that will detect low reliability. I was thinking one that if the website was down (couldn't connect). It would then do a series of checks and see if the website reliability is too low. For example it could repeatedly load a url, say 1k times over a 60second period, and if the reliability is less than 0.1% alert us.
There are lots of "gotchas" to this, the url would need to not load the server much if at all, 1k times over a 60 second period would be ideal, but not likely reachable without forking or multiple threads, etc... But the idea is what I'm looking for.
I searched the forums, and the exchange with no such luck is anyone aware of something that will do this?
Thanks!
Joey
Measuring Website Reliability
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Measuring Website Reliability
I don't know of such a plugin but would strongly suggest starting by just change the max_check_attempts setting for your service in nagios so it alerts on every failure.
Additionally, I will mention you could probably make a plugin like you desire, HOWEVER, I really want to caution you on what you are proposing.
You are proposing a plugin run 1000 url loads a minute, which will cause a traffic increase to your web server of 1,440,000 page loads per day (this is over and above the current load), for every page you run this plugin on.
Additionally, I will mention you could probably make a plugin like you desire, HOWEVER, I really want to caution you on what you are proposing.
You are proposing a plugin run 1000 url loads a minute, which will cause a traffic increase to your web server of 1,440,000 page loads per day (this is over and above the current load), for every page you run this plugin on.
Re: Measuring Website Reliability
Never done this before - but probably you can use check_logfiles to check your webserver logs and count the lines with (real) denied accesses?
https://labs.consol.de/nagios/check_logfiles/
https://labs.consol.de/nagios/check_logfiles/
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Measuring Website Reliability
This would work if the server was rejecting and logging, but not if it wasn't reachable, or the service wasn't runningmartinQL wrote:Never done this before - but probably you can use check_logfiles to check your webserver logs and count the lines with (real) denied accesses?
https://labs.consol.de/nagios/check_logfiles/
Re: Measuring Website Reliability
I am only proposing doing the 1000 check if it fails, in order to prevent false alarms when a failure occurs.scottwilkerson wrote:I don't know of such a plugin but would strongly suggest starting by just change the max_check_attempts setting for your service in nagios so it alerts on every failure.
Additionally, I will mention you could probably make a plugin like you desire, HOWEVER, I really want to caution you on what you are proposing.
You are proposing a plugin run 1000 url loads a minute, which will cause a traffic increase to your web server of 1,440,000 page loads per day (this is over and above the current load), for every page you run this plugin on.
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Measuring Website Reliability
I have no suggestions on how to set this up. sorry