Alert going to HARD state in 1st attempt

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
amane
Posts: 77
Joined: Thu Jan 18, 2018 9:53 am

Alert going to HARD state in 1st attempt

Post by amane »

Hi,

We have configured the Process monitoring in Nagios xi for windows boxes, however the processes status is going to HARD state in 1st attempt even after 'Max Check attempt' set to 4.

2021-07-23 04:17:02 rpabotprd18.lowes.com Windows Service:Windows Service Status:Blue Prism Login Agent Service WARNING HARD 1 of 4 WARNING: , delayed (LoginAgent=starting (auto))
2021-07-23 03:00:43 rpabotprd57.lowes.com Windows Service:Windows Service Status:Blue Prism Login Agent Service WARNING HARD 1 of 4 WARNING: , delayed (LoginAgent=stopped (delayed))
2021-07-23 02:45:44 rpabotprd57.lowes.com Windows Service:Windows Service Status:Blue Prism Login Agent Service WARNING HARD 1 of 4 WARNING: , delayed (LoginAgent=stopped (delayed))

Above are some example events.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Alert going to HARD state in 1st attempt

Post by ssax »

Is the host in a problem state? When a host is in a problem state (hard or soft) it automatically sets service problems to hard state (even on first attempt). The only way to get around that would be to sethost_down_disable_service_checks=1 in your /usr/local/nagios/etc/nagios.cfg and restart the nagios service or apply config.

If that is not what is occurring:

Please go to Reports > State History:
- Adjust the Period to include the time this occurred (go back far enough to include the host state)
- Select the host from the Limit To dropdown
--- Don't limit on the Services, we want to see host and service states
- For Type, select Both
- For State Type, select Both
- For State, select Any
- Click Run

Please PM me the report, you can either download it as a PDF or CSV.

Please PM me a copy of your profile.zip as well, you can download it from Admin > System Profile by clicking the Download Profile button.
amane
Posts: 77
Joined: Thu Jan 18, 2018 9:53 am

Re: Alert going to HARD state in 1st attempt

Post by amane »

Hi ssax,


verified the /usr/local/nagios/etc/nagios.cfg file and found host_down_disable_service_checks already set to 1.

host_down_disable_service_checks=1

Also this is happening on multiple servers where service monitoring configured and host was not down when HARD event is going in 1st Attempt.

report and profile has been sent on PM.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Alert going to HARD state in 1st attempt

Post by ssax »

Please do this again but run it for 2021-07-23:

Please go to Reports > State History:
- Adjust the Period to include the time this occurred (go back far enough to include the host state)
- Select the host from the Limit To dropdown
--- Don't limit on the Services, we want to see host and service states
- For Type, select Both
- For State Type, select Both
- For State, select Any
- Click Run

Please PM me the report, you can either download it as a PDF or CSV.
amane
Posts: 77
Joined: Thu Jan 18, 2018 9:53 am

Re: Alert going to HARD state in 1st attempt

Post by amane »

Hi ssax,

Sent the report for 2021-07-23 in PM.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Alert going to HARD state in 1st attempt

Post by ssax »

Was that option added after the 23rd?

The host was down that's why they went into a hard state immediately.

But if host_down_disable_service_checks was set to 1, it should have stopped them from checking.

Looks like they were both occurred at the same time if you look at the timestamp in the CSV, that's likely why, you should increase the check intervals on the services to be a little longer than the host checks, so if the host check is every 5 minutes, you would set the services to 6 minutes to make sure that they occurred sequentially. That way the host_down_disable_service_checks would function properly.
amane
Posts: 77
Joined: Thu Jan 18, 2018 9:53 am

Re: Alert going to HARD state in 1st attempt

Post by amane »

Thanks for the update ssax.

Was that option added after the 23rd?
No, it was there from start.

As suggested, increased the service check time interval to 6. We will monitor for some more days and let you know if it is occurs again.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Alert going to HARD state in 1st attempt

Post by ssax »

Sounds good, we'll keep an eye out for your update.
Locked