Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
arenist
Posts: 27 Joined: Fri Nov 29, 2013 9:29 am
Post
by arenist » Mon Feb 09, 2015 6:51 am
Hi supporters,
nagios found an error in a log file but didn't notify me.
I'm using nagios 3.5.0. The service is:
Code: Select all
define service{
use generic-service
hostgroup_name iAS
service_description ElsaMarke errors
contact_groups admins,ias_log
max_check_attempts 1
notification_options w,u,c
# normal_check_interval 60
# notification_interval 240
check_command check_nrpe!check_elsaMarke
}
On the remote-machine:
Code: Select all
command[check_elsaMarke]=/usr/local/nagios/libexec/check_iaslog -F /opt/jboss/jboss-eap-5.1/jboss-as/server/elsaMarke/log/marke.debug.log -O /usr/local/nagios/libexec/elsaMarke.error.log -q "error"
The script check_iaslog is a modified variant of check_log.
I found in an archived nagios log the message:
Code: Select all
[1423060750] SERVICE ALERT: majb-vp-11;ElsaMarke errors;CRITICAL;HARD;1;< at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded < 2015-02-04 15:34:50,394 [-0.0.0.0-8009-1] [elheller(m=312) ] ERROR interceptor.TxInterceptor - Serverfehler : app_inst bin boot dev etc home lib lib64 lost+found media misc mnt net opt proc root sbin selinux shared_data srv sys test_pdf tftpboot tmp usr var java.lang.OutOfMemoryError - GC overhead limit exceeded < at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded < 2015-02-04 15:34:50,395 [0.0.0.0-8009-18] [anblumst ] ERROR s.common.CommonWebService - Systemfehler : null < at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) < 2015-02-04 15:34:50,417 [orkManager(2)-7] [ankammac(m=7635) ] ERROR e.CommonStatelessBeanBase - java.io.IOException: Invalid HTTP server response [408] - Request Time-out < 2015-02-04 15:34:50,418 [orkManager(2
[1423061350] SERVICE ALERT: majb-vp-11;ElsaMarke errors;OK;HARD;1;Log check ok - 0 pattern matches found
I don't understand why there was no notification about an OutOfMemoryError in a JBoss log file.
Can you help me please?
Regards,
arenist
tgriep
Madmin
Posts: 9190 Joined: Thu Oct 30, 2014 9:02 am
Post
by tgriep » Mon Feb 09, 2015 4:12 pm
Are the contacts in the contact groups "admins" and "ias_log" setup to receive notifications?
Could you post the settings for those groups and the users assigned to them?
Also, is the server setup with notifications enabled?
Be sure to check out our
Knowledgebase for helpful articles and solutions!
arenist
Posts: 27 Joined: Fri Nov 29, 2013 9:29 am
Post
by arenist » Tue Feb 10, 2015 8:16 am
Hi tgriep,
my contacts.cfg is configured well. I think I found out myself why nagios didn't post the incident. The service "ElsaMarke Errors" on server majb-vp-11 was flapping:
Code: Select all
[1422946150] SERVICE FLAPPING ALERT: majb-vp-11;ElsaMarke errors;STARTED; Service appears to have started flapping (22.6% change >= 20.0% threshold)
...
[1423060750] SERVICE ALERT: majb-vp-11;ElsaMarke errors;CRITICAL;HARD;1;< at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded < 2015-02-04 15:34:50,394 [-0.0.0.0-8009-1] [elheller(m=312) ] ERROR interceptor.TxInterceptor - Serverfehler : app_inst bin boot dev etc home lib lib64 lost+found media misc mnt net opt proc root sbin selinux shared_data srv sys test_pdf tftpboot tmp usr var java.lang.OutOfMemoryError - GC overhead limit exceeded < at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) < Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded < 2015-02-04 15:34:50,395 [0.0.0.0-8009-18] [anblumst ] ERROR s.common.CommonWebService - Systemfehler : null < at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) < 2015-02-04 15:34:50,417 [orkManager(2)-7] [ankammac(m=7635) ] ERROR e.CommonStatelessBeanBase - java.io.IOException: Invalid HTTP server response [408] - Request Time-out < 2015-02-04 15:34:50,418 [orkManager(2
...
[1423079350] SERVICE FLAPPING ALERT: majb-vp-11;ElsaMarke errors;STOPPED; Service appears to have stopped flapping (3.8% change < 5.0% threshold)
If you agree to my thoughts you can close this thread.
Thanks for your investigations,
arenist
tmcdonald
Posts: 9117 Joined: Mon Sep 23, 2013 8:40 am
Post
by tmcdonald » Tue Feb 10, 2015 10:11 am
Yes, that would make sense if you have flap detection enabled.
Former Nagios employee