Page 1 of 2
Wrong Alerts from Alterting function in NLS 1.4.4
Posted: Wed Mar 29, 2017 10:00 pm
by comfone
Hi
We have around 44 alerts configured in our NSL. The alerts trigger an alarm via NRDP to our Nagios XI instance. From there we either send eMail or Text Message notifications.
We have the problem that NLS triggers wrong alerts. The status of the alert switches to critical, but when I show the dashboard of this alert, the condition is not given and when I click on "run the alert now" it switches immediately back to "OK".
It happened the last 2 nights at almost the same time (3 a.m. CEST) when a wrong alert was triggered.
This specific alert has a lookback period of 90min and a check interval of 30min. The threshold is set to 1: (both).
Our Command Subsystem looks like:
cleanup_cmdsubsys 1 hour
backups 1 day
backup_maintenance 1 day
run_all_alerts 1 minute
run_update_check 1 day
Are there known issues with the Alerting function? Is there anything a can improve/configure in order to not happen again?
Best regards,
Philipp
Re: Wrong Alerts from Alterting function in NLS 1.4.4
Posted: Thu Mar 30, 2017 9:52 am
by mcapra
comfone wrote:when I show the dashboard of this alert, the condition is not given
Could you elaborate on this? When you say the condition is not given, do you mean that the Nagios Log Server Alerts page does not display anything?
Can you also share the output of this command and tell me the name of the alert that originally produced this issue:
Code: Select all
curl -XGET 'http://localhost:9200/nagioslogserver/alert/_search?size=100'
Feel free to PM the results of that curl command as it may contain sensitive information.
Re: Wrong Alerts from Alterting function in NLS 1.4.4
Posted: Fri Mar 31, 2017 3:57 am
by comfone
Hi,
thanks for your answer and sorry for my unclear explanation.
With "condition is not given" I meant that NLS gets the expected log information regularly and there is no (obvious) reason to switch the alert to critical. The alerts page works and when I click on the "Show alert in dashboard" button, the dashboard shows me the expected log entries for the defined lookback time.
Attached you can find a text file which contains the curl output. During the past days there were two alerts which triggered false alarms: SSG-STATISTICS-ApplicationAlive and the other SSG-ETDR-ApplicationAlive
Br,
Philipp
Re: Wrong Alerts from Alterting function in NLS 1.4.4
Posted: Fri Mar 31, 2017 11:15 am
by mcapra
The alerts and their queries look reasonable enough. Lets check the audit log. Can you share the output of the following commands executed from the CLI of your Nagios Log Server machine:
Code: Select all
curl -XGET 'http://localhost:9200/nagioslogserver_log/ALERT/_search?size=100' -d '{"query":{"filtered":{"filter":{"range":{"created":{"from":1490717756000,"to":1490976686000}}},"query":{"query_string":{"query":"SSG-STATISTICS-ApplicationAlive"}}}}}'
curl -XGET 'http://localhost:9200/nagioslogserver_log/ALERT/_search?size=100' -d '{"query":{"filtered":{"filter":{"range":{"created":{"from":1490717756000,"to":1490976686000}}},"query":{"query_string":{"query":"SSG-ETDR-ApplicationAlive"}}}}}'
If I could also get the logs from the destination Naigos XI machine, that might also be helpful. They're typically be found here:
Code: Select all
/usr/local/nagios/var/archives/nagios-03-31-2017-00.log
/usr/local/nagios/var/archives/nagios-03-30-2017-00.log
/usr/local/nagios/var/archives/nagios-03-29-2017-00.log
/usr/local/nagios/var/archives/nagios-03-28-2017-00.log
Re: Wrong Alerts from Alterting function in NLS 1.4.4
Posted: Mon Apr 03, 2017 10:32 am
by comfone
Please find the files attached.
Re: Wrong Alerts from Alterting function in NLS 1.4.4
Posted: Mon Apr 03, 2017 12:22 pm
by mcapra
I see one CRITICAL from right around the time your indices would have been rotating. May/may not be related but it's definitely noteworthy:
Code: Select all
{
"_index": "nagioslogserver_log",
"_type": "ALERT",
"_id": "AVshufVRl8iePEtZwRgh",
"_score": 1.4803797,
"_source": {
"created": 1490919486800,
"created_by": "System",
"type": "ALERT",
"message": "Alert Name SSG-ETDR-ApplicationAlive returned CRITICAL: 0 matching entries found |logs=0;1:;1:",
"source": "Nagios Log Server"
}
}
Can you share the output of the following command executed from the CLI of one of your Nagios Log Server instances:
Code: Select all
curl -XGET 'http://localhost:9200/logstash-*/_search?size=100' -d '{"query":{"filtered":{"query":{"bool":{"should":[{"query_string":{"query":"*"}}]}},"filter":{"bool":{"must":[{"range":{"@timestamp":{"from":1490918286000,"to":1490919486800}}},{"fquery":{"query":{"query_string":{"query":"type:(\"SSG - ETDRS\")"}},"_cache":true}},{"terms":{"logsource":["csvdb008"]}},{"terms":{"Data.raw":["OK"]}}]}}}}}'
Can you also share a screenshot of your "Administration -> Backup & Maintenance" page?
Re: Wrong Alerts from Alterting function in NLS 1.4.4
Posted: Fri Apr 07, 2017 6:50 am
by comfone
Hi,
please find the CURL output attached.
Thanks a lot for your support!
Br,
Philipp
Re: Wrong Alerts from Alterting function in NLS 1.4.4
Posted: Fri Apr 07, 2017 6:53 am
by comfone
...and the screenshot.
Re: Wrong Alerts from Alterting function in NLS 1.4.4
Posted: Fri Apr 07, 2017 12:48 pm
by mcapra
This looks to be some sort of false positive caused by the indices rotating in/out or the indices being optimized around that time. I might be wrong, but that's my initial assessment.
If you want to open up a email ticket with
[email protected] and do a remote session, we can do that too. Reproducing it consistently and proving the above is going to be pretty difficult, though.
The other option is to adjust your max check attempts for the passive services in Nagios XI to help eliminate some of those false positives. This is likely an issue with Elasticsearch and not something that's going to be immediately solvable with simple modifications to the alerting logic.
Re: Wrong Alerts from Alterting function in NLS 1.4.4
Posted: Mon Apr 10, 2017 10:42 am
by comfone
Hi,
thanks a lot for your efforts!
I will consider to adjust the "check attempts" in Nagios XI.
Br,
Philipp