Page 2 of 4

Re: Lookback period issue regression in 1.4

Posted: Fri Jan 22, 2016 3:18 pm
by hsmith
I've set up my server to match. I'll see what kind of alerts I get over the weekend, and post back Monday. Jesse will also be back in Monday, so we can discuss this with him as well, it seems you two are getting to know each other pretty well!

Re: Lookback period issue regression in 1.4

Posted: Fri Jan 22, 2016 3:56 pm
by weveland
He's a bright kid. Destined for great things.

I look forward to hearing from you on Monday.

Re: Lookback period issue regression in 1.4

Posted: Mon Jan 25, 2016 2:25 pm
by jolson
Hey Wayne, long time no see!

I spoke with the developers regarding this issue, and they'll need the following information to diagnose it:

-A screenshot of one of the problem alerts, including all of the settings that you have specifically set.

-The following files: /var/www/html/nagioslogserver/application/helpers/data_helper.php and /var/www/html/nagioslogserver/html/application/helpers/data_helper.php

-Do these alerts always trigger at a particular hour/minute, or does it seem random? I'm guessing rather random based on your output in this thread so far.

-I'm wondering if the missed triggers relate to a timezone offset in some way?

Let us know the results of the above and we'll get back to you. Thanks!

Re: Lookback period issue regression in 1.4

Posted: Tue Jan 26, 2016 10:48 am
by weveland
Mr. Olson & Agent Smith,

Sorry I didn't get back to you yesterday, was out of the office. decrypton phrase will be in your PM's Mr. Olson.
jolson wrote: -A screenshot of one of the problem alerts, including all of the settings that you have specifically set.
You will find the screenshots in the attached archive.
jolson wrote: -The following files: /var/www/html/nagioslogserver/application/helpers/data_helper.php and /var/www/html/nagioslogserver/html/application/helpers/data_helper.php
The First file exists and is in the archive, the second file does not exist.
jolson wrote: -Do these alerts always trigger at a particular hour/minute, or does it seem random? I'm guessing rather random based on your output in this thread so far.
It appears that it occurs in the time leading up to 7:00AM EST. This strangely is also the time the daily backup runs (@ 7:00AM). I've included a chronological list of the alerts for you to compare times. Again it is in the archive.
jolson wrote: -I'm wondering if the missed triggers relate to a timezone offset in some way?
Anything is possible.

-W
nagsupport.zip

Re: Lookback period issue regression in 1.4

Posted: Tue Jan 26, 2016 12:49 pm
by weveland
Another side note of more urgency. The admin panel is now just a blank page. I restarted the whole server after individual component restarts didn't help (httpd, logstash, elasticsearch).
Still the same effect. Administration => Blank page

Re: Lookback period issue regression in 1.4

Posted: Tue Jan 26, 2016 2:43 pm
by jolson
The kibana-int database is in control of the loading of the Administration panel, and it's possible that it has an unassigned shard or similar - try the following command out:

Code: Select all

curl -s 'localhost:9200/_cluster/health?level=indices&pretty'| grep kibana -A9
If the kibana-int database is healthy, check the apache logs after clicking 'Administration' - anything relevant in those logs?

Also, there are noted problems with Internet Explorer + Nagios Log Server - be sure you're using Firefox or Google Chrome.

Thanks Wayne! I'm still working on the original problem as related to this thread. I thought we had reproduced it in our lab, but unfortunately it was just a mis-firing script that caused our alert process to fail. I'll be coming back with more information when I've discussed the problem further with our developers.

Re: Lookback period issue regression in 1.4

Posted: Tue Jan 26, 2016 3:27 pm
by weveland
People still use Internet Explorer/Microsoft Edge?

"kibana-int" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},

Status of system has been yellow since day 1 because it's a single host and not a cluster.

Re: Lookback period issue regression in 1.4

Posted: Tue Jan 26, 2016 3:56 pm
by jolson
People still use Internet Explorer/Microsoft Edge?
I am of the opinion that everyone should use elinks. :ugeek:
Status of system has been yellow since day 1 because it's a single host and not a cluster.
Yup, the kibana-int database seems fine. Anything notable in the httpd logs?

Code: Select all

tail -n50 /var/log/httpd/*log
Does the behavior change between HTTP/HTTPS?

Re: Lookback period issue regression in 1.4

Posted: Tue Jan 26, 2016 4:02 pm
by weveland
Nobody uses elinks!

==> /var/log/httpd/ssl_access_log <==
172.16.140.254 - - [26/Jan/2016:16:01:44 -0500] "GET /nagioslogserver/admin HTTP/1.1" 500 -

==> /var/log/httpd/ssl_request_log <==
[26/Jan/2016:16:01:44 -0500] 172.16.140.254 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 "GET /nagioslogserver/admin HTTP/1.1" -

==> /var/log/httpd/access_log <==
172.16.140.254 - - [26/Jan/2016:16:01:49 -0500] "GET /nagioslogserver/admin HTTP/1.1" 500 - "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0"
172.16.140.254 - - [26/Jan/2016:16:01:56 -0500] "GET /nagioslogserver/admin HTTP/1.1" 500 - "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0"

Re: Lookback period issue regression in 1.4

Posted: Tue Jan 26, 2016 4:05 pm
by hsmith
weveland wrote:Nobody uses elinks!
Stallman uses elinks.

This is probably an obvious question, but you didn't run out of disk space, did you?