Lookback period issue regression in 1.4

hsmith · Post by **hsmith** » Fri Jan 22, 2016 3:18 pm

I've set up my server to match. I'll see what kind of alerts I get over the weekend, and post back Monday. Jesse will also be back in Monday, so we can discuss this with him as well, it seems you two are getting to know each other pretty well!

weveland · Post by **weveland** » Fri Jan 22, 2016 3:56 pm

He's a bright kid. Destined for great things.

I look forward to hearing from you on Monday.

jolson · Post by **jolson** » Mon Jan 25, 2016 2:25 pm

Hey Wayne, long time no see!

I spoke with the developers regarding this issue, and they'll need the following information to diagnose it:

-A screenshot of one of the problem alerts, including all of the settings that you have specifically set.

-The following files: /var/www/html/nagioslogserver/application/helpers/data_helper.php and /var/www/html/nagioslogserver/html/application/helpers/data_helper.php

-Do these alerts always trigger at a particular hour/minute, or does it seem random? I'm guessing rather random based on your output in this thread so far.

-I'm wondering if the missed triggers relate to a timezone offset in some way?

Let us know the results of the above and we'll get back to you. Thanks!

weveland · Post by **weveland** » Tue Jan 26, 2016 10:48 am

Mr. Olson & Agent Smith,

Sorry I didn't get back to you yesterday, was out of the office. decrypton phrase will be in your PM's Mr. Olson.

jolson wrote: -A screenshot of one of the problem alerts, including all of the settings that you have specifically set.

You will find the screenshots in the attached archive.

jolson wrote: -The following files: /var/www/html/nagioslogserver/application/helpers/data_helper.php and /var/www/html/nagioslogserver/html/application/helpers/data_helper.php

The First file exists and is in the archive, the second file does not exist.

jolson wrote: -Do these alerts always trigger at a particular hour/minute, or does it seem random? I'm guessing rather random based on your output in this thread so far.

It appears that it occurs in the time leading up to 7:00AM EST. This strangely is also the time the daily backup runs (@ 7:00AM). I've included a chronological list of the alerts for you to compare times. Again it is in the archive.

jolson wrote: -I'm wondering if the missed triggers relate to a timezone offset in some way?

Anything is possible.

-W

nagsupport.zip

weveland · Post by **weveland** » Tue Jan 26, 2016 12:49 pm

Another side note of more urgency. The admin panel is now just a blank page. I restarted the whole server after individual component restarts didn't help (httpd, logstash, elasticsearch).
Still the same effect. Administration => Blank page

jolson · Post by **jolson** » Tue Jan 26, 2016 2:43 pm

The kibana-int database is in control of the loading of the Administration panel, and it's possible that it has an unassigned shard or similar - try the following command out:

Code: Select all

curl -s 'localhost:9200/_cluster/health?level=indices&pretty'| grep kibana -A9

If the kibana-int database is healthy, check the apache logs after clicking 'Administration' - anything relevant in those logs?

Also, there are noted problems with Internet Explorer + Nagios Log Server - be sure you're using Firefox or Google Chrome.

Thanks Wayne! I'm still working on the original problem as related to this thread. I thought we had reproduced it in our lab, but unfortunately it was just a mis-firing script that caused our alert process to fail. I'll be coming back with more information when I've discussed the problem further with our developers.

weveland · Post by **weveland** » Tue Jan 26, 2016 3:27 pm

People still use Internet Explorer/Microsoft Edge?

"kibana-int" : {
"status" : "yellow",
"number_of_shards" : 5,
"number_of_replicas" : 1,
"active_primary_shards" : 5,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 5
},

Status of system has been yellow since day 1 because it's a single host and not a cluster.

jolson · Post by **jolson** » Tue Jan 26, 2016 3:56 pm

People still use Internet Explorer/Microsoft Edge?

I am of the opinion that everyone should use elinks.

Status of system has been yellow since day 1 because it's a single host and not a cluster.

Yup, the kibana-int database seems fine. Anything notable in the httpd logs?

Code: Select all

tail -n50 /var/log/httpd/*log

Does the behavior change between HTTP/HTTPS?

weveland · Post by **weveland** » Tue Jan 26, 2016 4:02 pm

Nobody uses elinks!

==> /var/log/httpd/ssl_access_log <==
172.16.140.254 - - [26/Jan/2016:16:01:44 -0500] "GET /nagioslogserver/admin HTTP/1.1" 500 -

==> /var/log/httpd/ssl_request_log <==
[26/Jan/2016:16:01:44 -0500] 172.16.140.254 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 "GET /nagioslogserver/admin HTTP/1.1" -

==> /var/log/httpd/access_log <==
172.16.140.254 - - [26/Jan/2016:16:01:49 -0500] "GET /nagioslogserver/admin HTTP/1.1" 500 - "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0"
172.16.140.254 - - [26/Jan/2016:16:01:56 -0500] "GET /nagioslogserver/admin HTTP/1.1" 500 - "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:43.0) Gecko/20100101 Firefox/43.0"

hsmith · Post by **hsmith** » Tue Jan 26, 2016 4:05 pm

weveland wrote:Nobody uses elinks!

Stallman uses elinks.

This is probably an obvious question, but you didn't run out of disk space, did you?

Nagios Support Forum

Lookback period issue regression in 1.4

Re: Lookback period issue regression in 1.4

Re: Lookback period issue regression in 1.4

Re: Lookback period issue regression in 1.4

Re: Lookback period issue regression in 1.4

Re: Lookback period issue regression in 1.4

Re: Lookback period issue regression in 1.4

Re: Lookback period issue regression in 1.4

Re: Lookback period issue regression in 1.4

Re: Lookback period issue regression in 1.4

Re: Lookback period issue regression in 1.4