Page 1 of 1

No notification after an error

Posted: Mon Feb 09, 2015 6:51 am
by arenist
Hi supporters,

nagios found an error in a log file but didn't notify me.

I'm using nagios 3.5.0. The service is:

Code: Select all

define service{
        use generic-service
        hostgroup_name                  iAS
        service_description             ElsaMarke errors
        contact_groups                  admins,ias_log
        max_check_attempts              1
        notification_options            w,u,c
#       normal_check_interval           60
#       notification_interval           240
        check_command                   check_nrpe!check_elsaMarke
}
On the remote-machine:

Code: Select all

command[check_elsaMarke]=/usr/local/nagios/libexec/check_iaslog -F /opt/jboss/jboss-eap-5.1/jboss-as/server/elsaMarke/log/marke.debug.log -O /usr/local/nagios/libexec/elsaMarke.error.log -q "error"
The script check_iaslog is a modified variant of check_log.

I found in an archived nagios log the message:

Code: Select all

[1423060750] SERVICE ALERT: majb-vp-11;ElsaMarke errors;CRITICAL;HARD;1;< at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded < 2015-02-04 15:34:50,394 [-0.0.0.0-8009-1] [elheller(m=312) ] ERROR interceptor.TxInterceptor - Serverfehler : app_inst bin boot dev etc home lib lib64 lost+found media misc mnt net opt proc root sbin selinux shared_data srv sys test_pdf tftpboot tmp usr var java.lang.OutOfMemoryError - GC overhead limit exceeded < at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded < 2015-02-04 15:34:50,395 [0.0.0.0-8009-18] [anblumst ] ERROR s.common.CommonWebService - Systemfehler : null < at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) < 2015-02-04 15:34:50,417 [orkManager(2)-7] [ankammac(m=7635) ] ERROR e.CommonStatelessBeanBase - java.io.IOException: Invalid HTTP server response [408] - Request Time-out < 2015-02-04 15:34:50,418 [orkManager(2
[1423061350] SERVICE ALERT: majb-vp-11;ElsaMarke errors;OK;HARD;1;Log check ok - 0 pattern matches found
I don't understand why there was no notification about an OutOfMemoryError in a JBoss log file.

Can you help me please?

Regards,
arenist

Re: No notification after an error

Posted: Mon Feb 09, 2015 4:12 pm
by tgriep
Are the contacts in the contact groups "admins" and "ias_log" setup to receive notifications?
Could you post the settings for those groups and the users assigned to them?
Also, is the server setup with notifications enabled?

Re: No notification after an error

Posted: Tue Feb 10, 2015 8:16 am
by arenist
Hi tgriep,

my contacts.cfg is configured well. I think I found out myself why nagios didn't post the incident. The service "ElsaMarke Errors" on server majb-vp-11 was flapping:

Code: Select all

[1422946150] SERVICE FLAPPING ALERT: majb-vp-11;ElsaMarke errors;STARTED; Service appears to have started flapping (22.6% change >= 20.0% threshold)
...
[1423060750] SERVICE ALERT: majb-vp-11;ElsaMarke errors;CRITICAL;HARD;1;< at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded < 2015-02-04 15:34:50,394 [-0.0.0.0-8009-1] [elheller(m=312) ] ERROR interceptor.TxInterceptor - Serverfehler : app_inst bin boot dev etc home lib lib64 lost+found media misc mnt net opt proc root sbin selinux shared_data srv sys test_pdf tftpboot tmp usr var java.lang.OutOfMemoryError - GC overhead limit exceeded < at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded < 2015-02-04 15:34:50,395 [0.0.0.0-8009-18] [anblumst ] ERROR s.common.CommonWebService - Systemfehler : null < at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) < 2015-02-04 15:34:50,417 [orkManager(2)-7] [ankammac(m=7635) ] ERROR e.CommonStatelessBeanBase - java.io.IOException: Invalid HTTP server response [408] - Request Time-out < 2015-02-04 15:34:50,418 [orkManager(2
...
[1423079350] SERVICE FLAPPING ALERT: majb-vp-11;ElsaMarke errors;STOPPED; Service appears to have stopped flapping (3.8% change < 5.0% threshold)
If you agree to my thoughts you can close this thread.

Thanks for your investigations,
arenist

Re: No notification after an error

Posted: Tue Feb 10, 2015 10:11 am
by tmcdonald
Yes, that would make sense if you have flap detection enabled.