Page 1 of 3
Logserver Alerts below threshold not working
Posted: Thu Sep 17, 2015 3:36 pm
by weveland
Currently running Log Server 2015R2.2
I created some new alerts that check logs for events. If no events occur, the system sends an alert through NRDP. In the alert setup I specified the interval at 5 minutes and the lookback period at 12 hours.
For the warning and critical levels I set thresholds to 1: . This worked just fine until in the middle of the night the alert went off and said there were no events.
I clicked the little monitor to view this alert on a dashboard and could clearly see there were events, but the alert showed 0. I manually re-ran the alert and it still did not work. It stayed this way from around 6:00PM until midnight. Then just started working again.
Any ideas what could be going on?
Re: Logserver Alerts below threshold not working
Posted: Thu Sep 17, 2015 3:49 pm
by jolson
I'd like to ask a question about your timezone - is it a positive UTC offset?
If so, there is a small change that we need to make regarding your alerts system that will get you back up and running. This glitch was just discovered a few weeks ago, and will be patched in our next release.
Make the following change:
Code: Select all
vi /var/www/html/nagioslogserver/application/helpers/data_helper.php
Change:
Code: Select all
$range[] = "logstash-" . date('Y.m.d', $start);
To:
Code: Select all
$range[] = "logstash-" . gmdate('Y.m.d', $start);
Re: Logserver Alerts below threshold not working
Posted: Thu Sep 17, 2015 4:35 pm
by weveland
Unfortunately no my servers are currently UTC-4:00 EDT.
Although this might be indicative of a larger problem. It appears that other dashboard queries are not reporting things correctly since last night. For instance I have an apache server that sends its logs in. So I'm searching the ssl_access_logs and ssl_error_logs types then the specific host. I can see there are error 500 response codes that I've got indexed as response 500. If i select the magnifying glass next to the 500 to find more errors like it. It only shows me entries from yesterday. Not even the one I was just looking at.
Re: Logserver Alerts below threshold not working
Posted: Thu Sep 17, 2015 4:38 pm
by jolson
Interesting. What timezone is the computer you're using in? Also, be sure that the dates are set properly on your Nagios Log Server instances:
You might try resetting the timezone manually:
Code: Select all
cd /usr/local/nagioslogserver/scripts/
./change_timezone.sh -z America/Chicago
Be sure that your dates are proper among all of your instances.
Re: Logserver Alerts below threshold not working
Posted: Thu Sep 17, 2015 5:21 pm
by weveland
I'm pretty certain this is not a timezone problem. There's something up with the indexes or the database.
Thu Sep 17 18:19:45 EDT 2015
timezone /etc/php.ini
; Defines the default timezone used by the date functions
;
http://www.php.net/manual/en/datetime.c ... e.timezone
date.timezone = US/Eastern
Re: Logserver Alerts below threshold not working
Posted: Fri Sep 18, 2015 9:46 am
by weveland
Here is an example of what I'm talking about. It's almost as if the indexing of certain fields has stopped. But they're still in the results and still parsed. Just not searchable. These queries were run on the same system one right after the other.
With response field query
response_field_not_indexed.png
Without response field query
response_field_filter_removed.png
Re: Logserver Alerts below threshold not working
Posted: Fri Sep 18, 2015 2:20 pm
by jdalrymple
weveland wrote:It's almost as if the indexing of certain fields has stopped.
Weird.
Presumably nobody was meddling about with your filters at the time it broke?
Are your indexes rotating properly still?
Is the response code field the only one exhibiting that behavior or are there others?
Re: Logserver Alerts below threshold not working
Posted: Fri Sep 18, 2015 2:20 pm
by tmcdonald
What sort of document/index usage are we looking at? Go to Admin, and take screenshots of Cluster Status Instance Status.
Re: Logserver Alerts below threshold not working
Posted: Fri Sep 18, 2015 2:24 pm
by weveland
No. This system just went live into semi-production for me the other day and I'm pretty much the only one doing the meddling. Also the indexes appear to still be rotating daily and the response code field is the only one that seems to be having this issue that I can tell so far.
The particular filter I'm using to parse these is as follows (there is more for other files but I've excluded it.):
if [type] == 'apache-access' {
if [file] == '/var/log/httpd/access_log' {
grok {
match => [ 'message', '%{COMBINEDAPACHELOG}']
}
date {
match => [ 'timestamp', 'dd/MMM/yyyy:HH:mm:ss Z' ]
}
mutate {
replace => [ 'type', 'apache_access' ]
convert => [ 'bytes', 'integer' ]
convert => [ 'response', 'integer' ]
}
if( "_grokparsefailure" not in [tags]) {
mutate { remove_field => "message" }
}
}
}
jdalrymple wrote:weveland wrote:It's almost as if the indexing of certain fields has stopped.
Weird.
Presumably nobody was meddling about with your filters at the time it broke?
Are your indexes rotating properly still?
Is the response code field the only one exhibiting that behavior or are there others?
Re: Logserver Alerts below threshold not working
Posted: Fri Sep 18, 2015 2:28 pm
by weveland
Here are screenshots. Currently running a single server/instance as that's what I'm licensed for.
Cluster Status
cluster status.png
Instance Status
instance status.png
tmcdonald wrote:What sort of document/index usage are we looking at? Go to Admin, and take screenshots of Cluster Status Instance Status.