So here's what is going on...
I'm using NSClient++ to check event logs on Windows servers. Here is the command I've defined in my commands.cfg file:
define command{
command_name check_log
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -p 5666 -c CheckEventLog -t 30 -a file="system" filter=new filter=out MaxWarn=1 MaxCrit=1 filter-generated=\>1h filter+severity==error filter-severity==success filter-severity==informational filter=in filter=all truncate=1023 unique descriptions "syntax=%severity%: %source%: %message% (%count%)"
}
And here is how I execute the check in my windows.cfg file:
define service{
use generic-service
host_name s-cdc-01.corp.liveops.com
service_description System-Event-Log
check_command check_log
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
is_volatile 0 ; The service is not volatile
check_period 24x7 ; The service can be checked at any time of the day
max_check_attempts 3 ; Re-check the service up to 3 times in order to determine its final (hard) state
normal_check_interval 5 ; Check the service every 10 minutes under normal conditions
retry_check_interval 2 ; Re-check the service every two minutes until a hard state can be determined
contact_groups admins ; Notifications get sent out to everyone in the 'admins' group
notification_options w,u,c,r,f,s ; Send notifications about warning, unknown, critical, and recovery events
notification_interval 5 ; Re-notify about service problems every hour
notification_period 24x7 ; Notifications can be sent out at any time
register 1
}
The problem is that I don't always get notified when an error shows up in the Event Log, or I'll get notified for an error that I don't see. For example, here's a message Nagios now generates and sends me:
***** Nagios *****
Notification Type: PROBLEM
Service: System-Event-Log
Host: S-CDC-01
Address: 192.168.152.14
State: CRITICAL
Date/Time: Wed Dec 14 11:21:12 PST 2011
Additional Info:
error: DCOM: (1), eventlog: 1 critical
If I look at the event log, though, I don't see any DCOM errors. I also see other errors I should have been notified of, but wasn't.
I'm only testing this on one server right now on one log, the system log. I was running it against the application log yesterday and getting lots of responses...and again for things I didn't see. If I read my query right, it should be returning errors it sees in the last hour. I was getting responses on errors I wasn't seeing at all.
I just need some help locking this down to get a competent query that returns useful information.
Monitoring Windows Event Logs
Re: Monitoring Windows Event Logs
We also tried the NSclient++ EventLog filtering over a year ago and we also found it a bit flakey... it also killed one of our exchange servers. We are currently using http://exchange.nagios.org/directory/Ad ... og/details which has its own problems but it seems to work more reliably and hasn't killed anything yet 
-
john.newman
- Posts: 6
- Joined: Wed Dec 21, 2011 1:26 pm
Re: Monitoring Windows Event Logs
^ +1 on all of that. We started with pure NSclient but found it a bit lacking and even quirky with the event log. The second tool there has been better. It is not very nice that we have to install two things, but it's been at least OK. It would be nice if those two projects would combine their ideas and merge into a single service.