Logserver Alerts below threshold not working

weveland · Post by **weveland** » Thu Sep 17, 2015 3:36 pm

Currently running Log Server 2015R2.2

I created some new alerts that check logs for events. If no events occur, the system sends an alert through NRDP. In the alert setup I specified the interval at 5 minutes and the lookback period at 12 hours.
For the warning and critical levels I set thresholds to 1: . This worked just fine until in the middle of the night the alert went off and said there were no events.

I clicked the little monitor to view this alert on a dashboard and could clearly see there were events, but the alert showed 0. I manually re-ran the alert and it still did not work. It stayed this way from around 6:00PM until midnight. Then just started working again.

Any ideas what could be going on?

jolson · Post by **jolson** » Thu Sep 17, 2015 3:49 pm

I'd like to ask a question about your timezone - is it a positive UTC offset?

If so, there is a small change that we need to make regarding your alerts system that will get you back up and running. This glitch was just discovered a few weeks ago, and will be patched in our next release.

Make the following change:

Code: Select all

vi /var/www/html/nagioslogserver/application/helpers/data_helper.php

Change:

Code: Select all

        $range[] = "logstash-" . date('Y.m.d', $start);

To:

Code: Select all

        $range[] = "logstash-" . gmdate('Y.m.d', $start);

weveland · Post by **weveland** » Thu Sep 17, 2015 4:35 pm

Unfortunately no my servers are currently UTC-4:00 EDT.

Although this might be indicative of a larger problem. It appears that other dashboard queries are not reporting things correctly since last night. For instance I have an apache server that sends its logs in. So I'm searching the ssl_access_logs and ssl_error_logs types then the specific host. I can see there are error 500 response codes that I've got indexed as response 500. If i select the magnifying glass next to the 500 to find more errors like it. It only shows me entries from yesterday. Not even the one I was just looking at.

jolson · Post by **jolson** » Thu Sep 17, 2015 4:38 pm

Interesting. What timezone is the computer you're using in? Also, be sure that the dates are set properly on your Nagios Log Server instances:

Code: Select all

date
grep timezone /etc/php.ini

You might try resetting the timezone manually:

Code: Select all

cd /usr/local/nagioslogserver/scripts/
./change_timezone.sh -z America/Chicago

Be sure that your dates are proper among all of your instances.

weveland · Post by **weveland** » Thu Sep 17, 2015 5:21 pm

I'm pretty certain this is not a timezone problem. There's something up with the indexes or the database.

Thu Sep 17 18:19:45 EDT 2015

timezone /etc/php.ini
; Defines the default timezone used by the date functions
; http://www.php.net/manual/en/datetime.c ... e.timezone
date.timezone = US/Eastern

weveland · Post by **weveland** » Fri Sep 18, 2015 9:46 am

Here is an example of what I'm talking about. It's almost as if the indexing of certain fields has stopped. But they're still in the results and still parsed. Just not searchable. These queries were run on the same system one right after the other.

With response field query

response_field_not_indexed.png

Without response field query

response_field_filter_removed.png

jdalrymple · Post by **jdalrymple** » Fri Sep 18, 2015 2:20 pm

weveland wrote:It's almost as if the indexing of certain fields has stopped.

Weird.

Presumably nobody was meddling about with your filters at the time it broke?
Are your indexes rotating properly still?

Is the response code field the only one exhibiting that behavior or are there others?

tmcdonald · Post by **tmcdonald** » Fri Sep 18, 2015 2:20 pm

What sort of document/index usage are we looking at? Go to Admin, and take screenshots of Cluster Status Instance Status.

weveland · Post by **weveland** » Fri Sep 18, 2015 2:24 pm

No. This system just went live into semi-production for me the other day and I'm pretty much the only one doing the meddling. Also the indexes appear to still be rotating daily and the response code field is the only one that seems to be having this issue that I can tell so far.

The particular filter I'm using to parse these is as follows (there is more for other files but I've excluded it.):
if [type] == 'apache-access' {
if [file] == '/var/log/httpd/access_log' {
grok {
match => [ 'message', '%{COMBINEDAPACHELOG}']
}
date {
match => [ 'timestamp', 'dd/MMM/yyyy:HH:mm:ss Z' ]
}
mutate {
replace => [ 'type', 'apache_access' ]
convert => [ 'bytes', 'integer' ]
convert => [ 'response', 'integer' ]
}
if( "_grokparsefailure" not in [tags]) {
mutate { remove_field => "message" }
}
}
}

jdalrymple wrote:
weveland wrote:It's almost as if the indexing of certain fields has stopped.
Weird.

Presumably nobody was meddling about with your filters at the time it broke?
Are your indexes rotating properly still?

Is the response code field the only one exhibiting that behavior or are there others?

weveland · Post by **weveland** » Fri Sep 18, 2015 2:28 pm

Here are screenshots. Currently running a single server/instance as that's what I'm licensed for.

Cluster Status

cluster status.png

Instance Status

instance status.png

tmcdonald wrote:What sort of document/index usage are we looking at? Go to Admin, and take screenshots of Cluster Status Instance Status.

Nagios Support Forum

Logserver Alerts below threshold not working

Logserver Alerts below threshold not working

Re: Logserver Alerts below threshold not working

Re: Logserver Alerts below threshold not working

Re: Logserver Alerts below threshold not working

Re: Logserver Alerts below threshold not working

Re: Logserver Alerts below threshold not working

Re: Logserver Alerts below threshold not working

Re: Logserver Alerts below threshold not working

Re: Logserver Alerts below threshold not working

Re: Logserver Alerts below threshold not working