Page 1 of 1
Alert going to HARD state in 1st attempt
Posted: Mon Jul 26, 2021 7:25 am
by amane
Hi,
We have configured the Process monitoring in Nagios xi for windows boxes, however the processes status is going to HARD state in 1st attempt even after 'Max Check attempt' set to 4.
2021-07-23 04:17:02 rpabotprd18.lowes.com Windows Service:Windows Service Status:Blue Prism Login Agent Service WARNING HARD 1 of 4 WARNING: , delayed (LoginAgent=starting (auto))
2021-07-23 03:00:43 rpabotprd57.lowes.com Windows Service:Windows Service Status:Blue Prism Login Agent Service WARNING HARD 1 of 4 WARNING: , delayed (LoginAgent=stopped (delayed))
2021-07-23 02:45:44 rpabotprd57.lowes.com Windows Service:Windows Service Status:Blue Prism Login Agent Service WARNING HARD 1 of 4 WARNING: , delayed (LoginAgent=stopped (delayed))
Above are some example events.
Re: Alert going to HARD state in 1st attempt
Posted: Mon Jul 26, 2021 12:37 pm
by ssax
Is the host in a problem state? When a host is in a problem state (hard or soft) it automatically sets service problems to hard state (even on first attempt). The only way to get around that would be to sethost_down_disable_service_checks=1 in your /usr/local/nagios/etc/nagios.cfg and restart the nagios service or apply config.
If that is not what is occurring:
Please go to Reports > State History:
- Adjust the Period to include the time this occurred (go back far enough to include the host state)
- Select the host from the Limit To dropdown
--- Don't limit on the Services, we want to see host and service states
- For Type, select Both
- For State Type, select Both
- For State, select Any
- Click Run
Please PM me the report, you can either download it as a PDF or CSV.
Please PM me a copy of your profile.zip as well, you can download it from Admin > System Profile by clicking the Download Profile button.
Re: Alert going to HARD state in 1st attempt
Posted: Tue Jul 27, 2021 5:07 am
by amane
Hi ssax,
verified the /usr/local/nagios/etc/nagios.cfg file and found host_down_disable_service_checks already set to 1.
host_down_disable_service_checks=1
Also this is happening on multiple servers where service monitoring configured and host was not down when HARD event is going in 1st Attempt.
report and profile has been sent on PM.
Re: Alert going to HARD state in 1st attempt
Posted: Tue Jul 27, 2021 12:41 pm
by ssax
Please do this again but run it for 2021-07-23:
Please go to Reports > State History:
- Adjust the Period to include the time this occurred (go back far enough to include the host state)
- Select the host from the Limit To dropdown
--- Don't limit on the Services, we want to see host and service states
- For Type, select Both
- For State Type, select Both
- For State, select Any
- Click Run
Please PM me the report, you can either download it as a PDF or CSV.
Re: Alert going to HARD state in 1st attempt
Posted: Wed Jul 28, 2021 10:19 am
by amane
Hi ssax,
Sent the report for 2021-07-23 in PM.
Re: Alert going to HARD state in 1st attempt
Posted: Wed Jul 28, 2021 6:18 pm
by ssax
Was that option added after the 23rd?
The host was down that's why they went into a hard state immediately.
But if host_down_disable_service_checks was set to 1, it should have stopped them from checking.
Looks like they were both occurred at the same time if you look at the timestamp in the CSV, that's likely why, you should increase the check intervals on the services to be a little longer than the host checks, so if the host check is every 5 minutes, you would set the services to 6 minutes to make sure that they occurred sequentially. That way the host_down_disable_service_checks would function properly.
Re: Alert going to HARD state in 1st attempt
Posted: Thu Jul 29, 2021 5:57 am
by amane
Thanks for the update ssax.
Was that option added after the 23rd?
No, it was there from start.
As suggested, increased the service check time interval to 6. We will monitor for some more days and let you know if it is occurs again.
Re: Alert going to HARD state in 1st attempt
Posted: Thu Jul 29, 2021 7:00 pm
by ssax
Sounds good, we'll keep an eye out for your update.