Cool, thanks! I'm entrenched in using PowerShell as my solution, but I'm going to keep that example and toy with it later.
Here is the PowerShell script I am using. It has been deployed to a few test servers so it's still in a "beta" stage. The only caveat is that my situation requires different events trigger alerts at different levels. I know that one service cannot return both a warning and critical state simultaneously, and I don't want to create two different services for this check since the events that trigger warnings are (as you'd assume) not a huge deal. What I've done is set the logic at the end to check for Critical events and if they exist, only alarm on those. If they don't, continue on to Warning and if they exist, only report on those. If neither exist, an "all is well" message is displayed with an OK status.
If only one event in one log needs monitored, a lot of this script can be stripped away as there is a lot of logic in there dealing with gathering status and perfdata values across all of the events I want to watch.
I'm no PowerShell expert, so I'm always open to ideas/critiques/insults. Here's the script (pardon the substitutions for sensitive stuff):
Code: Select all
#151007
$returnStateOK = 0
$returnStateWarning = 1
$returnStateCritical = 2
$returnStateUnknown = 3
$critMsg = @()
$warnMsg = @()
$perf = @()
$checkHz = $args[0]
$window = (Get-Date).AddMinutes(-$checkHz)
<#
The $checkHz (check frequency) variable is passed from the corresponding Nagios service $ARG2$ value
It correlates with the timeframe in the past in which events are read
The logic: This value should match the frequency at which the service is run so that only events created in between runs are read
#>
# Monitor [custom event log name] event log for "Error reading message on queue"
$event1 = Get-EventLog -LogName [custom event log name] | Where-Object {$_.TimeGenerated -gt $window -and $_.Message -like '*Error reading message on queue*'}
$count1 = ($event1).Count
$perf += "queue_errors=$count1;;1"
if ($count1 -ge 1){
$primary = (($event1).Message).Substring(95,9) | Select -Unique
$critMsg += "[custom event log name]: Error reading message on queue (primary app: $primary)"
}
# Monitor Application event log for "Insufficient system resources exist to complete the requested service"
$event2 = Get-EventLog -LogName Application | Where-Object {$_.TimeGenerated -gt $window -and $_.Message -like '*Insufficient system resources exist to complete the requested service*'}
$count2 = ($event2).Count
$perf += "resource_errors=$count2;;1"
if ($count2 -ge 1){
$warnMsg += "Application Event Log: Insufficient system resources"
}
# Monitor System event log for "operation initiated by the Registry failed"
$event3 = Get-EventLog -LogName System | Where-Object {$_.TimeGenerated -gt $window -and $_.Message -like '*operation initiated by the Registry failed*'}
$count3 = ($event3).Count
$perf += "io_errors=$count3;;1"
if ($count3 -ge 1){
$critMsg += "System Event Log: I/O Error"
}
# Monitor [custom event log name] event log for "invalid user"
$event4 = Get-EventLog -LogName [custom event log name] | Where-Object {$_.TimeGenerated -gt $window -and $_.Message -like '*invalid user*'}
$count4 = ($event4).Count
$perf += "user_errors=$count4;;1"
if ($count4 -ge 1){
$warnMsg += "[custom event log name]: Invalid user found"
}
# Evaluate, format, and return data to Nagios
if ($critMsg){
Write-Host CRITICAL: (($critMsg) -join ", ")"|"(($perf) -join " ")
exit $returnStateCritical
}
if ($warnMsg){
Write-Host WARNING: (($warnMsg) -join ", ")"|"(($perf) -join " ")
exit $returnStateWarning
}
else{
Write-Host "OK: Event logs are clean|"(($perf) -join " ")
exit $returnStateOK
}
Write-Host "UNKNOWN script state"
exit $returnStateUnknown
The output then looks like this (OK and Critical examples)
Code: Select all
CRITICAL: [custom event log name]: Error reading message on queue on primary app server [hostname]| queue_errors=1864;;1 resource_errors=0;;1 io_errors=0;;1 user_errors=0;;1
OK: Event logs are clean| queue_errors=0;;1 resource_errors=0;;1 io_errors=0;;1 user_errors=0;;1
I call the PowerShell external script from a batch where I'm bypassing execution policy, so my line in nsclient.ini looks like this.
Code: Select all
app_eventlogs=scripts\app_eventlogs.bat $ARG1$
And finally, the service looks like this.
Code: Select all
define service {
host_name [hostname]
service_description Event logs
use service-template-preprod
check_command check_nrpe_2arg!app_eventlogs!5!!!!!!
register 1
}
Or, from shell on Nagios server
Code: Select all
./check_nrpe -H nchapp049 -t 10 -c app_eventlogs -a 5