Page 1 of 1

Missing alerts

Posted: Mon Aug 10, 2015 1:51 pm
by tonyleatwork
Hi -

We have a check_wmi check that scans the system for all available HDD's and checks them against a WARNING/CRITICAL threshold.

One of the drives in that service group is in a WARNING status, but since it was a low priority drive we ACKNOWLEDGED the WARNING. A different drive in that service went to CRITICAL but we never received an email alert. This is odd because Nagios core did register a CRITICAL.

Is this expected behavior or did we bump into a bug or a system issue?

As you can see from the Nagios Core snapshots, the CRITICALs are registered with the system - but we never got the email.
alerthistogram.JPG
alerthistory.JPG

Re: Missing alerts

Posted: Mon Aug 10, 2015 3:00 pm
by tgriep
Did the notifications for that service get disabled by mistake?
Could you post how the service check is configured and search the objects.cache file and post the settings for that service also?

Code: Select all

/usr/local/nagios/var/objects.cache

Re: Missing alerts

Posted: Tue Aug 11, 2015 1:21 pm
by tonyleatwork
Hi -

I think the object cache contains all the settings as far as I can see but high level:

The custom command breaks down to:

$USER1$/check_wmi_plus.pl -H $HOSTADDRESS$ -u $USER10$ -p $USER11$ -m $ARG3$ $ARG4$ $ARG5$ $ARG6$ -t 110

We nested the login + pw inside the resource.cfg since it was sensitive (WMI requires windows admin privileges to work)

Then it alerts to WSG_WARNINGS and WSG_ALERTS

WSG_WARNINGS only alerts against WARNINGs
WSG_ALERTS only alerts against CRITICALs

During this time, those two email contact groups did receive different alerts, so I dont think it was email related or configuration related.

Code: Select all

define service {
        host_name       nwd2clst11.ad.analog.com
        service_description     All Disk Usage
        check_command   check_xi_service_wmiplus_secure!!!!checkdrivesize!-a '[c-z]' -w '90' -c '95' -y 2 -t 25!!!
        contact_groups  WSG_WARNINGS,WSG_ALERTS
        notification_period     24x7
        initial_state   o
        importance      0
        check_interval  5.000000
        retry_interval  1.000000
        max_check_attempts      3
        is_volatile     0
        parallelize_check       1
        active_checks_enabled   1
        passive_checks_enabled  1
        obsess  1
        event_handler_enabled   1
        low_flap_threshold      0.000000
        high_flap_threshold     0.000000
        flap_detection_enabled  1
        flap_detection_options  a
        freshness_threshold     0
        check_freshness 0
        notification_options    a
        notifications_enabled   1
        notification_interval   60.000000
        first_notification_delay        0.000000
        stalking_options        n
        process_perf_data       1
        retain_status_information       1
        retain_nonstatus_information    1
        }


Re: Missing alerts

Posted: Tue Aug 11, 2015 4:10 pm
by jdalrymple
This is all based upon whether you selected the sticky ack or not. Unfortunately it's poorly documented (and the default option):
Acknowledge command from Core interface wrote:This command is used to acknowledge a service problem. When a service problem is acknowledged, future notifications about problems are temporarily disabled until the service changes from its current state. If you want acknowledgement to disable notifications until the service recovers, check the 'Sticky Acknowledgement' checkbox. Contacts for this service will receive a notification about the acknowledgement, so they are aware that someone is working on the problem. Additionally, a comment will also be added to the service. Make sure to enter your name and fill in a brief description of what you are doing in the comment field. If you would like the service comment to remain once the acknowledgement is removed, check the 'Persistent Comment' checkbox. If you do not want an acknowledgement notification sent out to the appropriate contacts, uncheck the 'Send Notification' checkbox.

Re: Missing alerts

Posted: Tue Aug 11, 2015 4:38 pm
by tgriep
Can you post the host information from the objects.cache file?
That service doesn't have a template applied to is so the service will inherit the settings from the host.
If the host doesn't have the Notification Options set to how you like, then that is why the notification didn't get sent.
Can you check the Notifications log and verify that the notification didn't happen?

Re: Missing alerts

Posted: Wed Aug 12, 2015 9:52 am
by tonyleatwork
Where can I find the maillog? /var/log/mail just shows TO: field.

And it looks like there was a template:

xiwizard_windowswmi_service

Is that not sufficient?

Another key note is that this system DID alert back on 07/30 - and while it's possible something could've changed between then and now, I just want to understand whats going on in case this is affecting other systems.

Re: Missing alerts

Posted: Wed Aug 12, 2015 10:00 am
by tgriep
In the Home screen in XI, under Incident Management is the Notifications, click that and see if that service sent a notification at that time.

Also, in Core Config Manager, edit that service, Click on the Alert Setting tab and setup the notification options to how you want, save it and see if that resolves it for you.

Re: Missing alerts

Posted: Wed Aug 12, 2015 11:44 am
by tonyleatwork
You're right, this did not send out an alert it looks like. So it must've been the acknowledgement then. What I'll do is remove the host and re-add it and perform some user training.

Re: Missing alerts

Posted: Wed Aug 12, 2015 12:02 pm
by hsmith
tonyleatwork wrote:You're right, this did not send out an alert it looks like. So it must've been the acknowledgement then. What I'll do is remove the host and re-add it and perform some user training.
Is there anything else you need help with, or am I all right to close this topic?