Page 3 of 4

Re: Some alerts not firing

Posted: Thu Oct 01, 2015 1:20 pm
by Jklre
Also I want to mention that other alerts were firing during this time period.

Re: Some alerts not firing

Posted: Thu Oct 01, 2015 3:13 pm
by jolson
I want you to make the following change.

Edit one of our PHP files (this one helps the alert subsystem out):

Code: Select all

vi /var/www/html/nagioslogserver/application/helpers/data_helper.php
Change this (line 44):

Code: Select all

        $range[] = "logstash-" . date('Y.m.d', $start);
To:

Code: Select all

        $range[] = "logstash-" . gmdate('Y.m.d', $start);
After the change, you will not need to restart anything. Let me know if your alert consistency improves after performing the above.

Re: Some alerts not firing

Posted: Thu Oct 01, 2015 4:17 pm
by Jklre
jolson wrote:I want you to make the following change.

Edit one of our PHP files (this one helps the alert subsystem out):

Code: Select all

vi /var/www/html/nagioslogserver/application/helpers/data_helper.php
Change this (line 44):

Code: Select all

        $range[] = "logstash-" . date('Y.m.d', $start);
To:

Code: Select all

        $range[] = "logstash-" . gmdate('Y.m.d', $start);
After the change, you will not need to restart anything. Let me know if your alert consistency improves after performing the above.
I went ahead and made this change. Any ideas of what could be happening or anything else we can check or look at? Having a 6 hour gap with not receiving alerts is a major show stopper and has basically taken our implementation of this product to a complete stop until this is resolved.

Re: Some alerts not firing

Posted: Fri Oct 02, 2015 10:41 am
by jolson
This is a high priority issue, and you're not the only person experiencing it - with that said, we're certainly working on resolving it. I appreciate your patience.

After making the above change, did the behavior of your cluster change at all? In a few cases the above change resolved the alert problem entirely, but in some cases it did nothing. I would like to know your experience so that we can further track this bug down.

Re: Some alerts not firing

Posted: Fri Oct 02, 2015 10:54 am
by Jklre
jolson wrote:This is a high priority issue, and you're not the only person experiencing it - with that said, we're certainly working on resolving it. I appreciate your patience.

After making the above change, did the behavior of your cluster change at all? In a few cases the above change resolved the alert problem entirely, but in some cases it did nothing. I would like to know your experience so that we can further track this bug down.
I'm in the process of validating the alerts from last night after the change was made. ill let you know what we find. If you guys need any other information or want us to test something let us know. mi nagios logsever es su nagios logserver

Re: Some alerts not firing

Posted: Fri Oct 02, 2015 11:19 am
by jolson
I'm looking forward to whatever you find out! I will let you know if we need you to do any testing. I have had trouble reproducing this bug, is there any chance you have any reliable steps to do so?

Re: Some alerts not firing

Posted: Thu Oct 08, 2015 2:33 pm
by Jklre
jolson wrote:I'm looking forward to whatever you find out! I will let you know if we need you to do any testing. I have had trouble reproducing this bug, is there any chance you have any reliable steps to do so?
I've been having issues replicating this also. I have noticed more of these happen in the afternoon 4:00pm - 6:00pm ish

it looks like we are still having these issues after the change I found another skipped alert here on 10/6/15: I'm still sorting through the weeks data. Have you guys made any progress on this? We were planning on roiling this tool out into production but we cant continue until this is resolved.

Audit logs

Code: Select all

2015-10-06T06:01:03.839-07:00	ALERT	Alert ID YPMyn56UTsSPNi9toCW23A returned OK: 0 matching entries found |logs=0;0;0	 
View: Table / JSON / Raw
Field	Action	Value	Search
 _id	  	AVA9Pa3gXZbcqN-U9p3K	 
 _index	  	nagioslogserver_log	 
 _type	  	ALERT	 
 created 	  	1444136463839	 
 message	  	Alert ID YPMyn56UTsSPNi9toCW23A returned OK: 0 matching entries found |logs=0;0;0	 
 source	  	Nagios Log Server	 
 type	  	ALERT

2015-10-06T05:45:23.707-07:00	ALERT	Alert ID YPMyn56UTsSPNi9toCW23A returned OK: 0 matching entries found |logs=0;0;0	 
View: Table / JSON / Raw
Field	Action	Value	Search
 _id	  	AVA9L1V7XZbcqN-U9pa5	 
 _index	  	nagioslogserver_log	 
 _type	  	ALERT	 
 created 	  	1444135523707	 
 message	  	Alert ID YPMyn56UTsSPNi9toCW23A returned OK: 0 matching entries found |logs=0;0;0	 
 source	  	Nagios Log Server	 
 type	  	ALERT	 

Dashboard of alert:
error.jpg

Re: Some alerts not firing

Posted: Thu Oct 08, 2015 3:52 pm
by Jklre
heres another example on 10/5 around 5:00pm

Audit logs

Code: Select all

2015-10-05T17:04:22.643-07:00	ALERT	Alert ID AU7VqkIgosxmGFOd5nSZ returned OK: 0 matching entries found |logs=0;0;0	 
View: Table / JSON / Raw
Field	Action	Value	Search
 _id	  	AVA6dpnzXZbcqN-U9f04	 
 _index	  	nagioslogserver_log	 
 _type	  	ALERT	 
 created 	  	1444089862643	 
 message	  	Alert ID AU7VqkIgosxmGFOd5nSZ returned OK: 0 matching entries found |logs=0;0;0	 
 source	  	Nagios Log Server	 
 type	  	ALERT	 
2015-10-05T16:48:45.377-07:00	ALERT	Alert ID AU7VqkIgosxmGFOd5nSZ returned OK: 0 matching entries found |logs=0;0;0	 
View: Table / JSON / Raw
Field	Action	Value	Search
 _id	  	AVA6aEzCm6Hshcn6i6Yt	 
 _index	  	nagioslogserver_log	 
 _type	  	ALERT	 
 created 	  	1444088925377	 
 message	  	Alert ID AU7VqkIgosxmGFOd5nSZ returned OK: 0 matching entries found |logs=0;0;0	 
 source	  	Nagios Log Server	 
 type	  	ALERT

Dashboard of alert:
error2.jpg

Re: Some alerts not firing

Posted: Thu Oct 08, 2015 4:36 pm
by jolson
A new version of Nagios Log Server was released today that could very well deal with the problem you're experiencing.

You can download it here:
https://assets.nagios.com/downloads/nag ... 3.0.tar.gz

Upgrade instructions:
https://assets.nagios.com/downloads/nag ... Server.pdf

Please let me know if your problems persist after the upgrade. In addition to making alerting system fixes, we've refined the backup process further.

Changelog:
- Added ability to re-order table view -SW
- Added "Inspect" icon when using quick search -SW
- Change Audit Log to report Alert Name instead of ID -SW
- Fixed some missing translations -SW
- Fixed problem where index didn't exist before adding it to a query -SW
- Fixed bug where maintenance jobs were not run sequentially possible causing indexes to be deleted or closed before being backup -SW
- Fixed bug where IE was not redirecting window.location properly -SW
- Fixed bug where backup and maintenance process would not always complete all steps by re-ordering steps -SW
- Fixed bug causing incorrect index to be selected for alerts, specifically a problem when server timezone is offset from UTC -SW
- Fixed issue where logrotate had windows line endings and giving errors -JO

Re: Some alerts not firing

Posted: Thu Oct 08, 2015 4:44 pm
by Jklre
Whooo hoo ill install this right away and let you guys know what I find.
jolson wrote:A new version of Nagios Log Server was released today that could very well deal with the problem you're experiencing.

You can download it here:
https://assets.nagios.com/downloads/nag ... 3.0.tar.gz

Upgrade instructions:
https://assets.nagios.com/downloads/nag ... Server.pdf

Please let me know if your problems persist after the upgrade. In addition to making alerting system fixes, we've refined the backup process further.

Changelog:
- Added ability to re-order table view -SW
- Added "Inspect" icon when using quick search -SW
- Change Audit Log to report Alert Name instead of ID -SW
- Fixed some missing translations -SW
- Fixed problem where index didn't exist before adding it to a query -SW
- Fixed bug where maintenance jobs were not run sequentially possible causing indexes to be deleted or closed before being backup -SW
- Fixed bug where IE was not redirecting window.location properly -SW
- Fixed bug where backup and maintenance process would not always complete all steps by re-ordering steps -SW
- Fixed bug causing incorrect index to be selected for alerts, specifically a problem when server timezone is offset from UTC -SW
- Fixed issue where logrotate had windows line endings and giving errors -JO