I am new to this monitoring part of Nagios, and have about three or four websites i am monitoring and I get this alert fort just one of them a lot:
***** Nagios XI Alert *****
Nagios has detected a problem with this host.
Notification Type: PROBLEM
Host: Website - chasebrexton.myezyaccess.com
State: DOWN
Address: chasebrexton.myezyaccess.com
Info: CRITICAL - Socket timeout
but the website is not down and there never seems to be an issue. Currently, the configuration is set to check one time and alert right away in case of any down time, obliviously, I could change that, but if the website should actually go down we would not know.
PROBLEM Host Alert - Website
Re: PROBLEM Host Alert - Website
How often do you check to make sure the site is up? The first thing that comes to my mind is something like an aggressive IDS/IPS that is seeing the Nagios check as a possible attack on the site, and then blocking Nagios from connecting to the site.
When you get an alert, how long does it stay critical before going OK again?
When you get an alert, how long does it stay critical before going OK again?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
- Posts: 64
- Joined: Thu Aug 22, 2019 1:58 pm
Re: PROBLEM Host Alert - Website
The check settings are:
Check interval: 5 min
Retry interval: 1 min
Max check attempts: 1 attempts
Check interval: 5 min
Retry interval: 1 min
Max check attempts: 1 attempts
-
- Posts: 64
- Joined: Thu Aug 22, 2019 1:58 pm
Re: PROBLEM Host Alert - Website
Nagios will stay critical for about five to ten mins, it is not long.
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: PROBLEM Host Alert - Website
It sounds like Nagios is doing what it is supposed to do, it is alerting you when this server cannot be reached. This could be because of a networking outage or other, but from the Nagios servers perspective it is getting a Socket timeout when trying to run the plugin you have setup for this.ecolgroveMOT wrote:Nagios will stay critical for about five to ten mins, it is not long.
If this also happens where multiple hosts/services have similar error or SNMP checks do not receive a response (like your other post https://support.nagios.com/forum/viewto ... =6&t=56658 ), it could be that you have a networking issue that keeps reoccurring, and NAgios is going to have the same problem until it is resolved.
-
- Posts: 64
- Joined: Thu Aug 22, 2019 1:58 pm
Re: PROBLEM Host Alert - Website
is it better to use the snmp option or ns client option? Is there a way to resolve this?
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: PROBLEM Host Alert - Website
It depends. Using SNMP you are using UDP and if the packet is dropped or if there is an error connecting you are going to get the same error, which makes things difficult.ecolgroveMOT wrote:is it better to use the snmp option or ns client option? Is there a way to resolve this?
If it is in fact cause by a network problem as I suspect, tracking that down and fixing it would resolve the issue, there would be no way for Nagios to compensate other that to use a longer "Max check attempts" allowing the service to recover before sending a notification. This is the primary reason this setting exists.
-
- Posts: 64
- Joined: Thu Aug 22, 2019 1:58 pm
Re: PROBLEM Host Alert - Website
Do you suggest setting the max attempts higher?
I know when setting up a new alert Nagios has it as default set to 5, should this be what is recommended?
I know when setting up a new alert Nagios has it as default set to 5, should this be what is recommended?
-
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: PROBLEM Host Alert - Website
Personally I like the following defaults:ecolgroveMOT wrote:Do you suggest setting the max attempts higher?
I know when setting up a new alert Nagios has it as default set to 5, should this be what is recommended?
Check interval: 5 min
Retry interval: 1 min
Max check attempts: 5 attempts
With these one something comes back as non-OK, you get 5 attempts at 1 minute intervals for the service/network/etc to come back around before sending a notification.