Page 1 of 2

False alerts on nothing being found

Posted: Wed Feb 17, 2021 7:25 am
by connected
We have an Alert configured that is triggered when a certain text is no longer appearing in a log file for a certain period.

This was working fine before, but suddenly it keeps reporting that the text no longer has been found, while it's clearly there.
When we look at the history all is fine.
Run Time Status Alert Output Interval Lookback Warning Critical
Wed, 17 Feb 2021 13:00:19 +0100 OK OK: 4641 matching entries found |logs=4641;1:;1: 30m 30m 1: 1:
Wed, 17 Feb 2021 12:48:42 +0100 OK OK: 11507 matching entries found |logs=11507;1:;1: 30m 30m 1: 1:
Wed, 17 Feb 2021 12:18:26 +0100 OK OK: 11591 matching entries found |logs=11591;1:;1: 30m 30m 1: 1:
Wed, 17 Feb 2021 11:48:12 +0100 OK OK: 43092 matching entries found |logs=43092;1:;1: 30m 30m 1: 1:
Wed, 17 Feb 2021 11:18:10 +0100 OK OK: 3984 matching entries found |logs=3984;1:;1: 30m 30m 1: 1:
Wed, 17 Feb 2021 11:08:03 +0100 OK OK: 11802 matching entries found |logs=11802;1:;1: 30m 30m 1: 1:
Wed, 17 Feb 2021 10:38:02 +0100 OK OK: 507 matching entries found |logs=507;1:;1: 30m 30m 1: 1:
Wed, 17 Feb 2021 10:36:35 +0100 OK OK: 62 matching entries found |logs=62;1:;1: 30m 30m 1: 1:
Wed, 17 Feb 2021 10:36:26 +0100 OK OK: 42079 matching entries found |logs=42079;1:;1: 30m 30m 1: 1:

But the Alert get get via e-mail is stating the opposite.
Here is the full alert output:
CRITICAL: 0 matching entries found |logs=0;1:;1:
The last log from the alert query:
No matching logs found.


We've rebooted Nagios Log Server as there might be a hung process somewhere but this is not solving the issue.
Any clue on how to get to the root cause of this?

Re: False alerts on nothing being found

Posted: Wed Feb 17, 2021 1:08 pm
by connected
Ok, now this is getting weirder.
I've deleted the alert and created a new one.

Keeps giving false alarms.
I've now de-activated the alarm and it still keeps mailing false alarms ever 30 minutes...

Re: False alerts on nothing being found

Posted: Wed Feb 17, 2021 3:28 pm
by connected
The e-mail has a reference to a no longer existing alert AWSD132lSptOOhacSd9u so it seems.
I can no longer open it though.

http://nagiosls.mydomain.lan/nagioslogs ... T19:07:27Z

Re: False alerts on nothing being found

Posted: Wed Feb 17, 2021 5:06 pm
by cdienger
Is this alert in the configuration that was sent for the other issue? If so, what is the name?

I'd also like to get a profile from the NLS system. It can be gathered under Admin > System > System Status > Download System Profile or from the command line with:

/usr/local/nagioslogserver/scripts/profile.sh

This will create /tmp/system-profile.tar.gz.

Note that this file can be very large and may not be able to be uploaded through a private message because of its size. You can split the file into smaller files with the split command on the NLS(or other Linux machine) command line:

Code: Select all

split -b 5000000 /tmp/system-profile.tar.gz system-profile- -d
The above command will split the system-profile.tar.gz into 5MB segments and save them to files with the naming convention system-profile​-nn.

Re: False alerts on nothing being found

Posted: Thu Feb 18, 2021 5:55 am
by connected
This alert is in the same nagios Los Server installation yes. But it's different alert.

I'll send the System Profile shortly.

Re: False alerts on nothing being found

Posted: Fri Feb 19, 2021 5:13 am
by connected
FYI
I tried to delete the alert again by constructing the URL with the alert id but this also didn't help.
http://nagiosls.mydomain.lan/nagioslogs ... tOOhacSd9u

Hope you can find something in the logs.

Re: False alerts on nothing being found

Posted: Fri Feb 19, 2021 10:29 am
by cdienger
Run the following to check the backend to see if the alert is still there:

Code: Select all

curl -XGET 'localhost:9200/nagioslogserver/alert/_search?q=_id:AWSD132lSptOOhacSd9u&pretty'
Delete it if it is found:

Code: Select all

curl -XDELETE 'localhost:9200/nagioslogserver/alert/_search?q=_id:AWSD132lSptOOhacSd9u&pretty'
Also delete the alert history:

Code: Select all

curl -XDELETE 'localhost:9200/nagioslogserver_history'

Re: False alerts on nothing being found

Posted: Fri Feb 19, 2021 11:45 am
by connected
Thanks the the commands.
The alert seems already gone.

Code: Select all

# curl -XGET 'localhost:9200/nagioslogserver/alert/_search?q=_id:AWSD132lSptOOhacSd9u&pretty'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}
Delete the history too.
The alerts are still being mailed.

It truely are new alerts, so not old e-mails being stuck in the Exchange server.
What we do notice is that the e-mail states returned with a CRITICAL state at Fri, 19 Feb 2021 10:21:26 -0600

Which is strange because we are configured not on -0600 but on +0100.

Re: False alerts on nothing being found

Posted: Fri Feb 19, 2021 5:45 pm
by cdienger
And the link in the emails still points to AWSD132lSptOOhacSd9u ?

It is very odd. I'd like to get a fresh copy of the nagioslogserver index as well as nagioslogserver_history:

Code: Select all

curl -XPOST http://localhost:9200/nagioslogserver/_export?path=/tmp/nagioslogserver.tar.gz
curl -XPOST http://localhost:9200/nagioslogserver_history/_export?path=/tmp/nagioslogserver_history.tar.gz

Re: False alerts on nothing being found

Posted: Sat Feb 20, 2021 3:10 am
by connected
We received a few more alerts after executing the commands.
So it did not stop right after the commands.
But now it seems to have stopped!
Last alert was Fri, 19 Feb 2021 16:52:21 -0600

So let's wait for the weekend :-)