Page 1 of 1
service escalation not working
Posted: Mon Mar 15, 2021 8:14 pm
by jh129666
I have a service escalation called oracle-dba-cert that isn't being executed for some reason. When the monitor goes to a warning state an email notification is successfully sent but when it goes critical it looks like the service escalation isn't being used.
Thanks,
Jeff
Re: service escalation not working
Posted: Tue Mar 16, 2021 4:53 pm
by vtrac
Hi,
I search the pofile.zip and did not see "oracle-dba-cert" defined inside the objects.cache file.
However, I noticed the below error in your phpmailer.log.
Just guessing ... maybe the email did sent but to the wrong address (recipient) below:
Code: Select all
SMTP Error: Could not connect to SMTP host. (method=smtp;host=cin.mp-emaxx.com;port=25;security=none)
SMTP Error: The following recipients failed: • [email protected]<p>SMTP server error: 5.1.3 Bad recipient address syntax
</p>
(method=smtp;host=localhost;port=25;security=none)
Regards,
Vinh
Re: service escalation not working
Posted: Tue Mar 16, 2021 9:28 pm
by jh129666
Hi Vinh,
That phpmailer.log file is from 2016 so that's not the issue.
Below is the definition for oracle-dba-cert that's in serviceescalations.cfg. Is the config name supposed to commented out?
Further down are the definitions for "QDC Oracle DBA Cert" and "TDC Oracle DBA Cert".
define serviceescalation {
# config_name oracle-dba-cert
servicegroup_name QDC Oracle DBA Cert,TDC Oracle DBA Cert
contacts wipro-oracle-alarmpoint-24x7-cert,wipro-oracle-moogsoft-24x7-cert
first_notification 1
last_notification 2
notification_interval 0
escalation_period 24x7-Cert
escalation_options c,
}
define servicegroup {
servicegroup_name QDC Oracle DBA Cert
alias QDC Oracle DBA Cert
members chubdb1201.cert.cin.mp-emaxx.com,Hub Data Guard Sync
}
define servicegroup {
servicegroup_name TDC Oracle DBA Cert
alias TDC Oracle DBA Cert
members dchubdb1001.cert.len.mp-emaxx.com,DB cpu load: 16 cpu cores,dchubdb1001.cert.len.mp-emaxx.com,db servers: crond,dchubdb1001.cert.len.mp-emaxx.com,db servers: sshd,dchubdb1001.cert.len.mp-emaxx.com,Hub DB account status,dchubdb1001.cert.len.mp-emaxx.com,Hub DB jobs,dchubdb1001.cert.len.mp-emaxx.com,Hub DB locks,dchubdb1001.cert.len.mp-emaxx.com,Hub DB tablespace,dchubdb1001.cert.len.mp-emaxx.com,DB disk usage,dchubdb1001.cert.len.mp-emaxx.com,DB Memory: Physical-99-100,dchubdb1001.cert.len.mp-emaxx.com,DB Memory: Swap-90-80,dchubdb1001.cert.len.mp-emaxx.com,Hub Data Guard Sync,dchubdb1001.cert.len.mp-emaxx.com,SSV db device owner: /dev/raw,dchubdb1001.cert.len.mp-emaxx.com,SSV db device perms: /dev/raw,dchubdb1001.cert.len.mp-emaxx.com,SSV RAC interconnect,dchubdb1001.cert.len.mp-emaxx.com,SSV db: /opt/grid/product/12.2.0/grid/dbs,dchubdb1001.cert.len.mp-emaxx.com,SSV db: /opt/oracle/product/12.2.0/dbhome/dbs,dchubdb1001.cert.len.mp-emaxx.com,SSV db: sysaux,dchubdb1001.cert.len.mp-emaxx.com,SSV db: system,dchubdb1001.cert.len.mp-emaxx.com,SSV db: ssv_data,dchubdb1001.cert.len.mp-emaxx.com,SSV db: ssv_idx,dchubdb1002.cert.len.mp-emaxx.com,DB cpu load: 16 cpu cores,dchubdb1002.cert.len.mp-emaxx.com,db servers: crond,dchubdb1002.cert.len.mp-emaxx.com,db servers: sshd,dchubdb1002.cert.len.mp-emaxx.com,DB disk usage,dchubdb1002.cert.len.mp-emaxx.com,DB Memory: Physical-99-100,dchubdb1002.cert.len.mp-emaxx.com,DB Memory: Swap-90-80,dchubdb1002.cert.len.mp-emaxx.com,SSV db device owner: /dev/raw,dchubdb1002.cert.len.mp-emaxx.com,SSV db device perms: /dev/raw,dchubdb1002.cert.len.mp-emaxx.com,SSV RAC interconnect,dchubdb1002.cert.len.mp-emaxx.com,SSV db: /opt/grid/product/12.2.0/grid/dbs,dchubdb1002.cert.len.mp-emaxx.com,SSV db: /opt/oracle/product/12.2.0/dbhome/dbs
}
None of the below service escalation names that I created are showing up in objects.cache. There's an *, for all of them in the service description in nagios XI, and I assume that's because they're all setup to use a service group. Since I'm using service groups, it looks like all the individual hosts/services are what get written to objects.cache instead of the service escalation names that I defined.
dbe-dba-prod
dbe-dba-stage
oracle-dba-cert
oracle-dba-prod
oracle-dba-stage
sql-dba-prod
Re: service escalation not working
Posted: Wed Mar 17, 2021 3:21 pm
by vtrac
Hi,
Looking at your settings, you should be getting the "escalation" email on your first CRITICAL notification.
I tested it on my VM and I do get the email notification (sample below):
Code: Select all
***** Nagios XI Alert *****
** This is an escalated notification ** Nagios has detected a problem with this service.
Notification Type: PROBLEM
Please NOTE that the escalation email will only sent to those you defined in the service esacalation:
QDC Oracle DBA Cert, TDC Oracle DBA Cert
Please check with those people in those two groups "QDC Oracle DBA Cert" and "TDC Oracle DBA Cert" and see if they received the notification. Based on your settings, the first notification will ONLY sent out to those two groups, not your normal conacts/users defined in those services.
Here's the article on "Understanding Notification Escalations":
https://assets.nagios.com/downloads/nag ... ations.pdf
Regards,
Vinh
Re: service escalation not working
Posted: Fri Mar 19, 2021 8:06 am
by jh129666
It looks like the escalation is working when the service monitor goes directly to CRITICAL, but the escalation isn't working when the service monitor goes to WARNING and then to CRITICAL.
I'll be doing some testing next week and will let you know what I find.
Thanks,
Jeff
Re: service escalation not working
Posted: Fri Mar 19, 2021 9:27 am
by jh129666
The service I used to test with is setup to send email notifications for WARNING (uses the wipro_oracle_oraclequery_service_warning template) and then use the oracle-dba-prod service escalation for CRITICAL. When the service goes to a WARNING state, email notifications are being sent. When the service goes directly to a CRITICAL state, the service escalation is working. When the service goes to a WARNING state and then to a CRITICAL, the service escalation is not working.
I've attached a document that has some screenshots showing the service in a WARNING state with email notifications being sent, and then the service goes CRITICAL and no service notification occurs (should use the service escalation that is configured).
Thanks,
Jeff
Re: service escalation not working
Posted: Fri Mar 19, 2021 5:05 pm
by vtrac
Hi,
I also asked my teammate Sean to help test this and here's what he said:
Code: Select all
Worked on mine going from ok hard to crit and from warn hard to crit
Based on his last doc he sent, it isn't on the 1st or 2nd notification (the prior warnings are included in the notification count) so it would not have been escalated
If he wants it escalated anytime it goes critical set last notification to 0
So, it is working for us.
Regards,
Vinh
Re: service escalation not working
Posted: Mon Mar 22, 2021 9:27 am
by jh129666
Thanks for the update. I will change my configuration and do some testing, and I'll let you know the result. Thanks!!
Re: service escalation not working
Posted: Mon Mar 22, 2021 11:26 am
by jh129666
After making the configuration change, this is now working. Thanks for your help!! You can lock the thread.
Re: service escalation not working
Posted: Mon Mar 22, 2021 4:23 pm
by scottwilkerson
jh129666 wrote:After making the configuration change, this is now working. Thanks for your help!! You can lock the thread.
Great!
Locking thread