Logserver Alerts below threshold not working

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
weveland
Posts: 125
Joined: Tue Aug 11, 2015 4:10 pm
Location: cat /dev/urandom > /dev/sda

Logserver Alerts below threshold not working

Post by weveland »

Currently running Log Server 2015R2.2

I created some new alerts that check logs for events. If no events occur, the system sends an alert through NRDP. In the alert setup I specified the interval at 5 minutes and the lookback period at 12 hours.
For the warning and critical levels I set thresholds to 1: . This worked just fine until in the middle of the night the alert went off and said there were no events.

I clicked the little monitor to view this alert on a dashboard and could clearly see there were events, but the alert showed 0. I manually re-ran the alert and it still did not work. It stayed this way from around 6:00PM until midnight. Then just started working again.

Any ideas what could be going on?
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Logserver Alerts below threshold not working

Post by jolson »

I'd like to ask a question about your timezone - is it a positive UTC offset?

If so, there is a small change that we need to make regarding your alerts system that will get you back up and running. This glitch was just discovered a few weeks ago, and will be patched in our next release.

Make the following change:

Code: Select all

vi /var/www/html/nagioslogserver/application/helpers/data_helper.php
Change:

Code: Select all

        $range[] = "logstash-" . date('Y.m.d', $start);
To:

Code: Select all

        $range[] = "logstash-" . gmdate('Y.m.d', $start);
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
weveland
Posts: 125
Joined: Tue Aug 11, 2015 4:10 pm
Location: cat /dev/urandom > /dev/sda

Re: Logserver Alerts below threshold not working

Post by weveland »

Unfortunately no my servers are currently UTC-4:00 EDT.

Although this might be indicative of a larger problem. It appears that other dashboard queries are not reporting things correctly since last night. For instance I have an apache server that sends its logs in. So I'm searching the ssl_access_logs and ssl_error_logs types then the specific host. I can see there are error 500 response codes that I've got indexed as response 500. If i select the magnifying glass next to the 500 to find more errors like it. It only shows me entries from yesterday. Not even the one I was just looking at.
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Logserver Alerts below threshold not working

Post by jolson »

Interesting. What timezone is the computer you're using in? Also, be sure that the dates are set properly on your Nagios Log Server instances:

Code: Select all

date
grep timezone /etc/php.ini
You might try resetting the timezone manually:

Code: Select all

cd /usr/local/nagioslogserver/scripts/
./change_timezone.sh -z America/Chicago
Be sure that your dates are proper among all of your instances.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
weveland
Posts: 125
Joined: Tue Aug 11, 2015 4:10 pm
Location: cat /dev/urandom > /dev/sda

Re: Logserver Alerts below threshold not working

Post by weveland »

I'm pretty certain this is not a timezone problem. There's something up with the indexes or the database.

Thu Sep 17 18:19:45 EDT 2015

timezone /etc/php.ini
; Defines the default timezone used by the date functions
; http://www.php.net/manual/en/datetime.c ... e.timezone
date.timezone = US/Eastern
weveland
Posts: 125
Joined: Tue Aug 11, 2015 4:10 pm
Location: cat /dev/urandom > /dev/sda

Re: Logserver Alerts below threshold not working

Post by weveland »

Here is an example of what I'm talking about. It's almost as if the indexing of certain fields has stopped. But they're still in the results and still parsed. Just not searchable. These queries were run on the same system one right after the other.

With response field query
response_field_not_indexed.png
Without response field query
response_field_filter_removed.png
You do not have the required permissions to view the files attached to this post.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Logserver Alerts below threshold not working

Post by jdalrymple »

weveland wrote:It's almost as if the indexing of certain fields has stopped.
Weird.

Presumably nobody was meddling about with your filters at the time it broke?
Are your indexes rotating properly still?

Is the response code field the only one exhibiting that behavior or are there others?
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Logserver Alerts below threshold not working

Post by tmcdonald »

What sort of document/index usage are we looking at? Go to Admin, and take screenshots of Cluster Status Instance Status.
Former Nagios employee
weveland
Posts: 125
Joined: Tue Aug 11, 2015 4:10 pm
Location: cat /dev/urandom > /dev/sda

Re: Logserver Alerts below threshold not working

Post by weveland »

No. This system just went live into semi-production for me the other day and I'm pretty much the only one doing the meddling. Also the indexes appear to still be rotating daily and the response code field is the only one that seems to be having this issue that I can tell so far.

The particular filter I'm using to parse these is as follows (there is more for other files but I've excluded it.):
if [type] == 'apache-access' {
if [file] == '/var/log/httpd/access_log' {
grok {
match => [ 'message', '%{COMBINEDAPACHELOG}']
}
date {
match => [ 'timestamp', 'dd/MMM/yyyy:HH:mm:ss Z' ]
}
mutate {
replace => [ 'type', 'apache_access' ]
convert => [ 'bytes', 'integer' ]
convert => [ 'response', 'integer' ]
}
if( "_grokparsefailure" not in [tags]) {
mutate { remove_field => "message" }
}
}
}

jdalrymple wrote:
weveland wrote:It's almost as if the indexing of certain fields has stopped.
Weird.

Presumably nobody was meddling about with your filters at the time it broke?
Are your indexes rotating properly still?

Is the response code field the only one exhibiting that behavior or are there others?
weveland
Posts: 125
Joined: Tue Aug 11, 2015 4:10 pm
Location: cat /dev/urandom > /dev/sda

Re: Logserver Alerts below threshold not working

Post by weveland »

Here are screenshots. Currently running a single server/instance as that's what I'm licensed for.

Cluster Status
cluster status.png
Instance Status
instance status.png
tmcdonald wrote:What sort of document/index usage are we looking at? Go to Admin, and take screenshots of Cluster Status Instance Status.
You do not have the required permissions to view the files attached to this post.
Locked