Page 1 of 1

Missed critical alert notification

Posted: Wed Feb 15, 2017 12:23 pm
by tonyleatwork
Hi -

We had a service that went from WARNING to CRITICAL and it was logged as such in Nagios. Unfortunately a notification never came out when it was CRITICAL and I want to understand why.

The service was in critical for over 1 hour but no alerts came out. We did get a WARNING and a RECOVERY message though. How do we proceed with the troubleshooting process to ensure this doesn't happen again?

Regards,

Tony Le

Code: Select all



System Profile
A system profile makes it easier for our support techs to understand the system that you are running on. Including a downloaded system profile with your support ticket is always a good idea.
Show Profile  Download Profile
Nagios XI Installation Profile

System:

Nagios XI Version : 5.3.0
nwd2ng01.corp.analog.com 2.6.32-504.3.3.el6.x86_64 x86_64
CentOS release 6.6 (Final)
Gnome is not installed
Apache Information

PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36
Server Name: nwd2ng01.corp.analog.com
Server Address: 10.64.52.120
Server Port: 80
Date/Time

PHP Timezone: America/New_York 
PHP Time: Wed, 15 Feb 2017 12:22:59 -0500
System Time: Wed, 15 Feb 2017 12:22:59 -0500
Nagios XI Data

License ends in: MSTNQS

nagios (pid 2379) is running...
NPCD running (pid 13499).
ndo2db (pid 2093) is running...
CPU Load 15: 11.28 
Total Hosts: 619 
Total Services: 4910 
Function 'get_base_uri' returns: http://nwd2ng01.corp.analog.com/nagiosxi/
Function 'get_base_url' returns: http://nwd2ng01.corp.analog.com/nagiosxi/
Function 'get_backend_url(internal_call=false)' returns: http://nwd2ng01.corp.analog.com/nagiosxi/includes/components/profile/profile.php
Function 'get_backend_url(internal_call=true)' returns: http://localhost/nagiosxi/backend/
Ping Test localhost

Running:
/bin/ping -c 3 localhost 2>&1 
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.015 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.013 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.014 ms

--- localhost ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2001ms
rtt min/avg/max/mdev = 0.013/0.014/0.015/0.000 ms
Test wget To localhost

WGET From URL: http://localhost/nagiosxi/includes/components/ccm/ 
Running:
/usr/bin/wget http://localhost/nagiosxi/includes/components/ccm/ 
--2017-02-15 12:23:02-- http://localhost/nagiosxi/includes/components/ccm/
Resolving localhost... ::1, 127.0.0.1
Connecting to localhost|::1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: "/usr/local/nagiosxi/tmp/ccm_index.tmp"

0K .......... ....... 187M=0s

2017-02-15 12:23:04 (187 MB/s) - "/usr/local/nagiosxi/tmp/ccm_index.tmp" saved [18221]

Network Settings

1: lo:  mtu 65536 qdisc noqueue state UNKNOWN 

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

    inet6 ::1/128 scope host 

       valid_lft forever preferred_lft forever

2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000

    link/ether 00:50:56:9f:52:ef brd ff:ff:ff:ff:ff:ff

    inet 10.64.52.120/24 brd 10.64.52.255 scope global eth0

    inet6 fe80::250:56ff:fe9f:52ef/64 scope link 

       valid_lft forever preferred_lft forever


10.64.52.0/24 dev eth0  proto kernel  scope link  src 10.64.52.120 

169.254.0.0/16 dev eth0  scope link  metric 1002 

default via 10.64.52.1 dev eth0 


Re: Missed critical alert notification

Posted: Wed Feb 15, 2017 12:48 pm
by mcapra
There is a known issue in 5.3.0 regarding alerts that contain \ and other such characters. I'm going to hazard a guess that the alert was for a Windows server's disk space?

At any rate, I would recommend upgrading to the latest version (or at least 5.3.2). From the changelog under 5.3.2:

Code: Select all

- Fixed event_meta base64 encoding when storing event_meta in the database -JO, BH