NCPA Windows EventLog

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
onegative
Posts: 175
Joined: Tue Feb 17, 2015 12:06 pm

NCPA Windows EventLog

Post by onegative »

G 'Day Nagios Support,

I appear to have stumbled on a bug in NCPA version 2.0.3 when utilizing the API Endpoint for logs. Seems like if your query as defined in the field "Log Timeframe" exceeds the actual amount of logging time in the log it craters the query and hangs the web server.

i.e. The Security log contains approx. 71 minutes of logging. I issue an API query against the log with logged_after=80m which seems to place the search in an endless loop hoping to find the additional amount of logging passed the total of 71 minutes.

I let the web gui continue to search and it never times out. I try using a new browser session but it also does not respond. The only way to get the NCPA listener to respond is to restart it using the services.msc. This is obviously a bad situation because it your logs get cleared and the query from an Active or Passive monitor exceeds the amount of history to search your web server becomes hung and unresponsive. This completely breaks the agent and forces a restart to recover.

Please attempt to replicate and confirm my assertion concerning the API Endpoint for logs on a Windows NCPA. IF you cannot replicate this please let know and suggest what might actually be the problem...but I would be hard pressed to not believe my own experience since I have tested this multiple times with the same results.

Let me know and thanks,
Danny
Last edited by onegative on Tue Jun 06, 2017 4:12 pm, edited 1 time in total.
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: NCPA Windows EventLog

Post by avandemore »

Can you provide the full query you are using? Does the ncpa log reveal anything if verbose is set to debug?
Previous Nagios employee
onegative
Posts: 175
Joined: Tue Feb 17, 2015 12:06 pm

Re: NCPA Windows EventLog

Post by onegative »

In my Passive cfg configuration of the NCPA I have the following:

#############################
# Windows Event Log Metrics #
#############################

# Parameter Name - WinEvent_Security_AUDIT_FAILURE
# This parameter is used monitor the Audit Failures for Event ID 4776
# WinEvent_Security_AUDIT_FAILURE will generate an event if the Warning and/or Critical threshold values are exceeded
# Log Timeframe is defined as the amount of time back in history to search for the specific event log events
%HOSTNAME%|WinEvent_Security_AUDIT_FAILURE = /logs --name Security --logged_after 5m --severity AUDIT_FAILURE --event_id 4776 --check true --warning 3 --critical 5

When I modify the --logged_after to exceed the amount of time in the log the ncpa_passive.log stops after the following entry and no passive objects for the host within Nagios XI ever gets updated again.

=======================================================DEBUG===========================================================
2017-06-06 11:10:49,263:INFO:ncpacheck:Running check: /logs --name Security --logged_after 5h --severity AUDIT_FAILURE --event_id 4776 --check true --warning 3 --critical 5
2017-06-06 11:10:49,263:DEBUG:ncpacheck:Getting API url for instruction /logs --name Security --logged_after 5h --severity AUDIT_FAILURE --event_id 4776 --check true --warning 3 --critical 5
2017-06-06 11:10:49,263:DEBUG:ncpacheck:Parsing command line style instruction: /logs --name Security --logged_after 5h --severity AUDIT_FAILURE --event_id 4776 --check true --warning 3 --critical 5
2017-06-06 11:10:49,263:DEBUG:ncpacheck:Determined instruction to be: /logs --name Security --logged_after 5h --severity AUDIT_FAILURE --event_id 4776 --check true --warning 3 --critical 5
2017-06-06 11:10:49,263:DEBUG:ncpacheck:Access the API with /api/logs/?severity=AUDIT_FAILURE&event_id=4776&critical=5&warning=3&logged_after=5h&check=1&name=Security
2017-06-06 11:10:49,263:DEBUG:psapi:Imported windowscounters into the API tree.
2017-06-06 11:10:49,263:DEBUG:psapi:Imported windowslogs into the API tree.

no more entries occur...
=======================================================DEBUG=============================================================


When I adjust the --logged_after to a value less than the total amount of time within the log I get the following debug and all objects get updated...

#############################
# Windows Event Log Metrics #
#############################

# Parameter Name - WinEvent_Security_AUDIT_FAILURE
# This parameter is used monitor the Audit Failures for Event ID 4776
# WinEvent_Security_AUDIT_FAILURE will generate an event if the Warning and/or Critical threshold values are exceeded
# Log Timeframe is defined as the amount of time back in history to search for the specific event log events
%HOSTNAME%|WinEvent_Security_AUDIT_FAILURE = /logs --name Security --logged_after 5m --severity AUDIT_FAILURE --event_id 4776 --check true --warning 3 --critical 5

=======================================================DEBUG===========================================================

2017-06-06 11:13:17,453:INFO:ncpacheck:Running check: /logs --name Security --logged_after 5m --severity AUDIT_FAILURE --event_id 4776 --check true --warning 3 --critical 5
2017-06-06 11:13:17,453:DEBUG:ncpacheck:Getting API url for instruction /logs --name Security --logged_after 5m --severity AUDIT_FAILURE --event_id 4776 --check true --warning 3 --critical 5
2017-06-06 11:13:17,453:DEBUG:ncpacheck:Parsing command line style instruction: /logs --name Security --logged_after 5m --severity AUDIT_FAILURE --event_id 4776 --check true --warning 3 --critical 5
2017-06-06 11:13:17,453:DEBUG:ncpacheck:Determined instruction to be: /logs --name Security --logged_after 5m --severity AUDIT_FAILURE --event_id 4776 --check true --warning 3 --critical 5
2017-06-06 11:13:17,453:DEBUG:ncpacheck:Access the API with /api/logs/?severity=AUDIT_FAILURE&event_id=4776&critical=5&warning=3&logged_after=5m&check=1&name=Security
2017-06-06 11:13:17,453:DEBUG:psapi:Imported windowscounters into the API tree.
2017-06-06 11:13:17,453:DEBUG:psapi:Imported windowslogs into the API tree.
2017-06-06 11:13:17,453:DEBUG:ncpacheck:Handling JSON response: {
"returncode": 0,
"stdout": "OK: Security has 0 logs, Total Count has 0 logs (Time range - last 5 minutes) | 'Security'=0;3;5; 'Total Count'=0;3;5;"
}
2017-06-06 11:13:17,453:DEBUG:ncpacheck:JSON response handled found stdout='OK: Security has 0 logs, Total Count has 0 logs (Time range - last 5 minutes) | 'Security'=0;3;5; 'Total Count'=0;3;5;', returncode=0
2017-06-06 11:13:17,453:DEBUG:ncpacheck:Next run is 1496773095.83
2017-06-06 11:13:17,453:DEBUG:ncpacheck:Next run set to be at 0

=======================================================DEBUG===========================================================

As stated previously the Web Interface also craters and fails and will not time out or recover...this is easily duplicated on my end and should be childs play for your team...

Thanks,
Danny
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: NCPA Windows EventLog

Post by lmiltchev »

Danny, I was able to recreate the problem, and posted a new issue on GitHub: https://github.com/NagiosEnterprises/ncpa/issues/350. Our developers have been made aware of the problem. Most probably, this will be fixed in the next release of NCPA. Thanks!
Be sure to check out our Knowledgebase for helpful articles and solutions!
onegative
Posts: 175
Joined: Tue Feb 17, 2015 12:06 pm

Re: NCPA Windows EventLog

Post by onegative »

That is great news...I am glad you were able to replicate.

Thanks for your confirmation and effort in helping resolve the issue for future releases...

Danny
Locked