Page 1 of 1

Different warning and critical parameters on a single config

Posted: Wed Feb 12, 2020 11:44 pm
by bramon
Hi,

Is there a way to define a host or service check config but the warning and critical have different configuration?

For example a ncpa service check for free disk space.

Warning
Threshold: 10GB
Alert Frequency: Every 1 hour
Check/Notification period: 8x5
Recipient: [email protected], [email protected]

Critical
Threshold: 1GB
Alert Frequency: Every 10 minutes
Check/Notification period: 24x7
Recipient: [email protected], [email protected], [email protected]


My objective is to limit sending of Critical alerts to a large set of recipients unless it is really urgent

Re: Different warning and critical parameters on a single co

Posted: Thu Feb 13, 2020 12:37 pm
by lmiltchev
You could probably achieve this by using notification escalations. I haven't really tested it, but you may be able to use something like this:

Code: Select all

define serviceescalation {
    # config_name            warning
    host_name                somehost
    service_description      someservice
    contacts                 contact1,contact2
    first_notification       0
    last_notification        0
    notification_interval    60
    escalation_period        8x5
    escalation_options       w,
}

define serviceescalation {
    # config_name            critical
    host_name                somehost
    service_description      someservice
    contacts                 contact1,contact2,contact3
    first_notification       0
    last_notification        0
    notification_interval    10
    escalation_period        24x7
    escalation_options       c,
}
Read more about notification escalations here:

https://assets.nagios.com/downloads/nag ... tions.html

Re: Different warning and critical parameters on a single co

Posted: Thu Feb 13, 2020 8:39 pm
by bramon
Hi,

Please correct me if I'm wrong but my understanding of escalation is that Nagios can send notification using different config (e.g. different recipient list, frequency, etc) if there is no action/acknowledgement or the state did not recover after X number of alerts.

https://assets.nagios.com/downloads/nag ... ations.pdf


My requirement is to have different alert config for warning and critical notifications.

Re: Different warning and critical parameters on a single co

Posted: Fri Feb 14, 2020 10:18 am
by lmiltchev
My objective is to limit sending of Critical alerts to a large set of recipients unless it is really urgent
Please correct me if I'm wrong but my understanding of escalation is that Nagios can send notification using different config (e.g. different recipient list, frequency, etc) if there is no action/acknowledgement or the state did not recover after X number of alerts.
In your first post you didn't say anything about recoveries, and who is supposed to receive these, and when... You could add the recoveries in your escalation config if you wish.
Serviceescalation - escalation options

This directive is used to define the criteria that determine when this service escalation is used. The escalation is used only if the service is in one of the states specified in this directive. If this directive is not specified in a service escalation, the escalation is considered to be valid during all service states. Valid options are a combination of one or more of the following:
r = escalate on an OK (recovery) state,
w = escalate on a WARNING state,
u = escalate on an UNKNOWN state, and
c = escalate on a CRITICAL state.

Example: If you specify w in this field, the escalation will only be used if the service is in a WARNING state.

Parameter name: escalation_options
Required: no
To answer your second question:
My requirement is to have different alert config for warning and critical notifications.
You define your warning (10GB) and critical (1GB) threshold on the service level, so whenever these thresholds are exceeded, the monitoring contacts would receive notification. You could have zero, one or more "normal" notifications before you escalate them. It's a matter of preference. My understanding was that you didn't want to send many notifications to contacts, "unless it is really urgent". That's why I suggested that you use:

Code: Select all

first_notification       0
This way, the notifications will be escalated right away, and no "regular" notifications will be sent out (limiting the number of alerts sent to your contacts). You can also adjust the "last_notification" value if you want contacts to be re-notified only a certain number of times.

Re: Different warning and critical parameters on a single co

Posted: Mon Feb 17, 2020 4:09 am
by bramon
Apologies I should have provided a more detailed scenario.

Our team handles servers and we are divided into sub-teams. Each sub-team acts as the primary administrator of specific set of servers. But each member, regardless of sub-team he/she belongs to can act as the secondary administrator of the servers of other teams. This happens when we are on skeletal force which is during outside office hours.

For example

TeamA
- Members are user1, user2, user3
- Handles ServerA, ServerB, ServerC

TeamB
- Members are user4, user5, user6
- Handles serverE, serverF, serverG


Here are the Scenarios for example on disk space problem:

1. If free disk space in ServerA falls below 10GB, only users 1 to 3 should receive warning notification every 1 hour
2. If free disk space in ServerE falls below 10GB, only users 4 to 6 should receive warning notification every 1 hour
3. If free disk space in any server A to G falls below 1GB, a critical notification should be sent to users 1 to 6 plus the manager every 10 minutes


Can this be done using escalation?

I am still reading about escalation and analyzing the solutions that were provided

Re: Different warning and critical parameters on a single co

Posted: Mon Feb 17, 2020 11:47 am
by lmiltchev
Yes, this could be done with escalations.

Here's an example of a disk usage service on ServerA:

Code: Select all

define service {
    host_name                 ServerA
    service_description       Disk Usage
    use                       xiwizard_ncpa_service
    check_command             check_xi_ncpa_agent!-t 'mytoken' -P 5693 -M 'disk/logical/C:|/free' -w 10: -c 1: -u Gi!!!!!!!
    max_check_attempts        5
    check_interval            5
    retry_interval            1
    check_period              xi_timeperiod_24x7
    notification_interval     60
    notification_period       xi_timeperiod_24x7
    notifications_enabled     1
    contacts                  nagiosadmin
    _xiwizard                 ncpa
    register                  1
}
You could use the same config for ServerB through ServerG. The escalations would look like this:

Code: Select all

define serviceescalation {
    # config_name            warning -  ServerA
    host_name                ServerA
    service_description      Disk Usage
    contacts                 user1,user2,user3
    first_notification       0
    last_notification        0
    notification_interval    60
    escalation_period        24x7
    escalation_options       w,
}

define serviceescalation {
    # config_name            warning -  ServerE
    host_name                ServerE
    service_description      Disk Usage
    contacts                 user4,user5,user6
    first_notification       0
    last_notification        0
    notification_interval    60
    escalation_period        24x7
    escalation_options       w,
}

define serviceescalation {
    # config_name            critical - all servers
    host_name                ServerA
    service_description      Disk Usage
    contacts                 user1,user2,user3,user4,user5,user6
    first_notification       0
    last_notification        0
    notification_interval    10
    escalation_period        24x7
    escalation_options       c,
}
You could use similar configs for the rest of the servers.

I would like to point out that there are multiple ways to set up your service and escalations. This is only one of them (adding all of the individual hosts/services to escalations).

If you wanted to have less configs, you could add your service to multiple hosts, e.g. you could have "Disk Usage A-C" added to ServerA, ServerB, and ServerC, and another service "Disk Usage E-G" added to ServerE, ServerF, and ServerG. Then, you can set up escalations for both services.

You could also create 2 hostgroups - "A-C" and "E-G", and add your services to each hostgroup.

In any case, then the service goes to a WARNING state, an escalated notifications should be sent to either user1 through user3 or user4 through user6. The escalated CRITICAL notifications will be sent to all users.

Note: I used 24x7 time period in my examples, but you could use a custom time periods if needed.