Measuring Website Reliability

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Measuring Website Reliability

Postby joeynovak » Tue May 15, 2018 1:24 pm

Hey Guys,

We recently had an issue where a webserver was denying approx 25% of requests. This was very bad but Nagios didn't alert us, since it requires 4 consecutive failures. Even 2 consequtive failures would mean it would only have a 1 in 16 chance of alerting us.

So, I'd like a plugin that will detect low reliability. I was thinking one that if the website was down (couldn't connect). It would then do a series of checks and see if the website reliability is too low. For example it could repeatedly load a url, say 1k times over a 60second period, and if the reliability is less than 0.1% alert us.

There are lots of "gotchas" to this, the url would need to not load the server much if at all, 1k times over a 60 second period would be ideal, but not likely reachable without forking or multiple threads, etc... But the idea is what I'm looking for.

I searched the forums, and the exchange with no such luck is anyone aware of something that will do this?

Thanks!

Joey
joeynovak
 
Posts: 2
Joined: Tue May 15, 2018 1:19 pm

Re: Measuring Website Reliability

Postby scottwilkerson » Tue May 15, 2018 2:35 pm

I don't know of such a plugin but would strongly suggest starting by just change the max_check_attempts setting for your service in nagios so it alerts on every failure.

Additionally, I will mention you could probably make a plugin like you desire, HOWEVER, I really want to caution you on what you are proposing.

You are proposing a plugin run 1000 url loads a minute, which will cause a traffic increase to your web server of 1,440,000 page loads per day (this is over and above the current load), for every page you run this plugin on.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12635
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Measuring Website Reliability

Postby martinQL » Wed May 16, 2018 1:07 am

Never done this before - but probably you can use check_logfiles to check your webserver logs and count the lines with (real) denied accesses?

https://labs.consol.de/nagios/check_logfiles/
martinQL
 
Posts: 10
Joined: Wed Apr 25, 2018 4:22 am

Re: Measuring Website Reliability

Postby scottwilkerson » Wed May 16, 2018 7:50 am

martinQL wrote:Never done this before - but probably you can use check_logfiles to check your webserver logs and count the lines with (real) denied accesses?

https://labs.consol.de/nagios/check_logfiles/


This would work if the server was rejecting and logging, but not if it wasn't reachable, or the service wasn't running
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12635
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Measuring Website Reliability

Postby joeynovak » Thu May 17, 2018 5:31 pm

scottwilkerson wrote:I don't know of such a plugin but would strongly suggest starting by just change the max_check_attempts setting for your service in nagios so it alerts on every failure.

Additionally, I will mention you could probably make a plugin like you desire, HOWEVER, I really want to caution you on what you are proposing.

You are proposing a plugin run 1000 url loads a minute, which will cause a traffic increase to your web server of 1,440,000 page loads per day (this is over and above the current load), for every page you run this plugin on.


I am only proposing doing the 1000 check if it fails, in order to prevent false alarms when a failure occurs.
joeynovak
 
Posts: 2
Joined: Tue May 15, 2018 1:19 pm

Re: Measuring Website Reliability

Postby scottwilkerson » Fri May 18, 2018 9:03 am

I have no suggestions on how to set this up. sorry
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 12635
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises


Return to Nagios Core

Who is online

Users browsing this forum: No registered users and 15 guests