Missing notifications for certain alarms

benjaminsmith · Post by **benjaminsmith** » Fri Sep 11, 2020 5:02 pm

Hi,

Thanks for sending that over. I'm not seeing the state change in the log, can you send over the log from nagios-09-10-2020-00.log as well.

It would be very helpful to force the service to generate a notification and then recover while we tail the logs. We can achieve this by directly sending passive check results to the service and then watch the logs.

Open up a shell on the server and run the following tail command:

Code: Select all

tail -F /var/log/maillog /usr/local/nagiosxi/tmp/phpmailer.log /usr/local/nagiosxi/var/eventman.log /usr/local/nagios/var/nagios.log

Then go to Home > Service Status:
1. Find the Service and click on it
2. Click the + tab
3. Note these two rows:

State Type: Hard
Current Check: 1 of 4

Those columns tell the current State Type and the Current Check number. In order to generate a notification for a service you will need to submit MULTIPLE problem check results (the number that you need to submit is determined by the last number in the Current Check column, that is the max_check_attempts setting).

For services, when you submit a passive check result, each result that you submit will be a SOFT state until you submit enough to hit the Max Check Attempts setting that you've defined on the service, only then will the service enter a HARD problem state which will generate the notification (just remember, notifications are only sent on HARD states).

passive-checks.png

- Click the "Submit passive check result" link
- Select the Check Result and type in some text for the Check Output
- Click the Submit button
- Submit as many as you need, right after another, until the service enters the HARD state so that a notification will be sent

When coming from a HARD problem state (whether we are talking about hosts or services) if you submit an OK passive result it should fire off a recovery notification after a single passive result has been submitted.

Also, if you have flap detection enabled, if flapping is detected due to changing state types, notifications will be suppressed.

Let me know if you able to generate notifications for a due to non ok state and corresponding recover notification. Then you can cross-check this with Opsgenie.

If not please post the full output of the tail command. Thanks, Benjamin

nms · Post by **nms** » Tue Sep 15, 2020 7:32 am

Hi
Thanks for the reply.
Attaching the other log file here.

nagios-09-10-2020-00.7z

For the action you requested, I did it but it never goes apst check 1 of 5 for some reason.. I did it a bunch of times but it just didn't seem to react past the first time.

Code: Select all

Service State:	Critical
Duration:	3m 15s
State Type:	Soft
Current Check:	1 of 5

Attaching also the tail output from the command you specified.

troubleshoot_tail_output.7z

Thanks

ssax · Post by **ssax** » Tue Sep 15, 2020 5:44 pm

I think you may be hitting this:

https://github.com/NagiosEnterprises/na ... issues/759

Or possibly this:

https://github.com/NagiosEnterprises/na ... issues/788

Please go to Reports > State History:
- Adjust the Period to include the time this occurred
- Select the host from the Limit To dropdown
--- Don't limit on the Services, we want to see host and service states
- For Type, select Both
- For State Type, select Both
- For State, select Any
- Click Run

Please send me the report, you can either download it as a PDF or CSV.

We want to see the host state during that time as well, please use the same host as the last screenshot.

nms · Post by **nms** » Wed Sep 16, 2020 2:03 am

Hi
Looking at that page it seems the issue yesterday was the display not updating or something, the alarms show up on this report (but clear quickly).

nms · Post by **nms** » Wed Sep 16, 2020 10:06 am

I did it again, but quicker, still no change on that page but when I look at the service state history it went through fine

ServiceStateHistory.png

Wondering if there's something particular about the 'out of range' alarms which we aren't reproducing with this test?

nms · Post by **nms** » Thu Sep 17, 2020 7:08 am

Hi again

We're still getting these cases, it does seem that when we get a CRITICAL because of connection issues, the clear is SOFT, meaning no notification is being sent.

Recent case state history:

2020-09-17 14_05_48-Service State History · Nagios XI.png

Notifications:

2020-09-17 14_06_11-Notifications · Nagios XI.png

ssax · Post by **ssax** » Thu Sep 17, 2020 2:34 pm

There is a bug in Nagios Core where if the host is in a down state (hard or soft) it immediately sets the services into a HARD state. When those services recover, they are showing as SOFT recoveries. You are very likely hitting this bug (which is still a bug that isn't resolved):

https://github.com/NagiosEnterprises/na ... issues/759

We need to see the host state at the time of these occurrences as the host state impacts the service state and what occurs.

nms · Post by **nms** » Mon Sep 21, 2020 2:21 am

Hi @ssax
Thanks for the reply, good to know the issue is already being looked at in any case.
Could you clarify please in what format you would need that state info, & how best I could get it for you? Through the GUI or in some log file?
We have the case quite often so should be able to provide whatever information would be useful.

Thanks

ssax · Post by **ssax** » Mon Sep 21, 2020 5:07 pm

Getting the report with these exact steps, you'll just need to go further back for the time with the Period dropdown, you'll need to go far enough back in the time where we can see what the host state was in as well:

Please go to Reports > State History:
- Adjust the Period to include the time this occurred (that includes the host state as well)
- Select the host from the Limit To dropdown
- For Type, select Both
- For State Type, select Both
- For State, select Any
- Click Run

CSV or PDF is fine.

nms · Post by **nms** » Tue Sep 22, 2020 2:51 am

Thanks @ssax
Attaching the report as requested.
Please let me know if there's anything else I can provide to help, or if you would like any other examples of this behaviour (like I said we have it quite regularly).

1600761020-statehistory.csv

Nagios Support Forum

Missing notifications for certain alarms

Re: Missing notifications for certain alarms

Re: Missing notifications for certain alarms

Re: Missing notifications for certain alarms

Re: Missing notifications for certain alarms

Re: Missing notifications for certain alarms

Re: Missing notifications for certain alarms

Re: Missing notifications for certain alarms

Re: Missing notifications for certain alarms

Re: Missing notifications for certain alarms

Re: Missing notifications for certain alarms