Logserver Alerts below threshold not working
Logserver Alerts below threshold not working
Currently running Log Server 2015R2.2
I created some new alerts that check logs for events. If no events occur, the system sends an alert through NRDP. In the alert setup I specified the interval at 5 minutes and the lookback period at 12 hours.
For the warning and critical levels I set thresholds to 1: . This worked just fine until in the middle of the night the alert went off and said there were no events.
I clicked the little monitor to view this alert on a dashboard and could clearly see there were events, but the alert showed 0. I manually re-ran the alert and it still did not work. It stayed this way from around 6:00PM until midnight. Then just started working again.
Any ideas what could be going on?
I created some new alerts that check logs for events. If no events occur, the system sends an alert through NRDP. In the alert setup I specified the interval at 5 minutes and the lookback period at 12 hours.
For the warning and critical levels I set thresholds to 1: . This worked just fine until in the middle of the night the alert went off and said there were no events.
I clicked the little monitor to view this alert on a dashboard and could clearly see there were events, but the alert showed 0. I manually re-ran the alert and it still did not work. It stayed this way from around 6:00PM until midnight. Then just started working again.
Any ideas what could be going on?
Re: Logserver Alerts below threshold not working
I'd like to ask a question about your timezone - is it a positive UTC offset?
If so, there is a small change that we need to make regarding your alerts system that will get you back up and running. This glitch was just discovered a few weeks ago, and will be patched in our next release.
Make the following change:
Change:
To:
If so, there is a small change that we need to make regarding your alerts system that will get you back up and running. This glitch was just discovered a few weeks ago, and will be patched in our next release.
Make the following change:
Code: Select all
vi /var/www/html/nagioslogserver/application/helpers/data_helper.phpCode: Select all
$range[] = "logstash-" . date('Y.m.d', $start);Code: Select all
$range[] = "logstash-" . gmdate('Y.m.d', $start);Re: Logserver Alerts below threshold not working
Unfortunately no my servers are currently UTC-4:00 EDT.
Although this might be indicative of a larger problem. It appears that other dashboard queries are not reporting things correctly since last night. For instance I have an apache server that sends its logs in. So I'm searching the ssl_access_logs and ssl_error_logs types then the specific host. I can see there are error 500 response codes that I've got indexed as response 500. If i select the magnifying glass next to the 500 to find more errors like it. It only shows me entries from yesterday. Not even the one I was just looking at.
Although this might be indicative of a larger problem. It appears that other dashboard queries are not reporting things correctly since last night. For instance I have an apache server that sends its logs in. So I'm searching the ssl_access_logs and ssl_error_logs types then the specific host. I can see there are error 500 response codes that I've got indexed as response 500. If i select the magnifying glass next to the 500 to find more errors like it. It only shows me entries from yesterday. Not even the one I was just looking at.
Re: Logserver Alerts below threshold not working
Interesting. What timezone is the computer you're using in? Also, be sure that the dates are set properly on your Nagios Log Server instances:
You might try resetting the timezone manually:
Be sure that your dates are proper among all of your instances.
Code: Select all
date
grep timezone /etc/php.iniCode: Select all
cd /usr/local/nagioslogserver/scripts/
./change_timezone.sh -z America/ChicagoRe: Logserver Alerts below threshold not working
I'm pretty certain this is not a timezone problem. There's something up with the indexes or the database.
Thu Sep 17 18:19:45 EDT 2015
timezone /etc/php.ini
; Defines the default timezone used by the date functions
; http://www.php.net/manual/en/datetime.c ... e.timezone
date.timezone = US/Eastern
Thu Sep 17 18:19:45 EDT 2015
timezone /etc/php.ini
; Defines the default timezone used by the date functions
; http://www.php.net/manual/en/datetime.c ... e.timezone
date.timezone = US/Eastern
Re: Logserver Alerts below threshold not working
Here is an example of what I'm talking about. It's almost as if the indexing of certain fields has stopped. But they're still in the results and still parsed. Just not searchable. These queries were run on the same system one right after the other.
With response field query Without response field query
With response field query Without response field query
You do not have the required permissions to view the files attached to this post.
-
jdalrymple
- Skynet Drone
- Posts: 2620
- Joined: Wed Feb 11, 2015 1:56 pm
Re: Logserver Alerts below threshold not working
Weird.weveland wrote:It's almost as if the indexing of certain fields has stopped.
Presumably nobody was meddling about with your filters at the time it broke?
Are your indexes rotating properly still?
Is the response code field the only one exhibiting that behavior or are there others?
Re: Logserver Alerts below threshold not working
What sort of document/index usage are we looking at? Go to Admin, and take screenshots of Cluster Status Instance Status.
Former Nagios employee
Re: Logserver Alerts below threshold not working
No. This system just went live into semi-production for me the other day and I'm pretty much the only one doing the meddling. Also the indexes appear to still be rotating daily and the response code field is the only one that seems to be having this issue that I can tell so far.
The particular filter I'm using to parse these is as follows (there is more for other files but I've excluded it.):
if [type] == 'apache-access' {
if [file] == '/var/log/httpd/access_log' {
grok {
match => [ 'message', '%{COMBINEDAPACHELOG}']
}
date {
match => [ 'timestamp', 'dd/MMM/yyyy:HH:mm:ss Z' ]
}
mutate {
replace => [ 'type', 'apache_access' ]
convert => [ 'bytes', 'integer' ]
convert => [ 'response', 'integer' ]
}
if( "_grokparsefailure" not in [tags]) {
mutate { remove_field => "message" }
}
}
}
The particular filter I'm using to parse these is as follows (there is more for other files but I've excluded it.):
if [type] == 'apache-access' {
if [file] == '/var/log/httpd/access_log' {
grok {
match => [ 'message', '%{COMBINEDAPACHELOG}']
}
date {
match => [ 'timestamp', 'dd/MMM/yyyy:HH:mm:ss Z' ]
}
mutate {
replace => [ 'type', 'apache_access' ]
convert => [ 'bytes', 'integer' ]
convert => [ 'response', 'integer' ]
}
if( "_grokparsefailure" not in [tags]) {
mutate { remove_field => "message" }
}
}
}
jdalrymple wrote:Weird.weveland wrote:It's almost as if the indexing of certain fields has stopped.
Presumably nobody was meddling about with your filters at the time it broke?
Are your indexes rotating properly still?
Is the response code field the only one exhibiting that behavior or are there others?
Re: Logserver Alerts below threshold not working
Here are screenshots. Currently running a single server/instance as that's what I'm licensed for.
Cluster Status Instance Status
Cluster Status Instance Status
tmcdonald wrote:What sort of document/index usage are we looking at? Go to Admin, and take screenshots of Cluster Status Instance Status.
You do not have the required permissions to view the files attached to this post.