Page 1 of 2
Cluster Service Check Fails
Posted: Fri Nov 01, 2019 6:09 pm
by kwhogster
Nagios Core 4.3.4
Windows 2012 R2 Clustered SQL Server
TGCS014-N1 Cluster SQLS Cluster
CRITICAL 11-01-2019 17:51:28 0d 11h 23m 58s 10/10 CLUSTER CRITICAL: SQL Server (SHAREPOINT): 0 ok, 0 warning, 2 unknown, 0 critical
My check
Code: Select all
define service{
use generic-service
host_name TGCS014-N1
service_description Cluster SQLS Cluster
check_interval 10080
notification_interval 10080
servicegroups Clusters
check_command check_service_cluster!"SQL Server (SHAREPOINT)"!0!1!$SERVICESTATEID:TGCS014-N1:MSSQLSHAREPOINT$,$SERVICESTATEID:TGCS014-N2:MSSQLSHAREPOINT$
check_period backup_WIN12
notification_period backup_WIN12
}
Code: Select all
# 'check service cluster' command definition
define command{
command_name check_service_cluster
command_line /usr/local/nagios/libexec/check_cluster --service -l $ARG1$ -w $ARG2$ -c $ARG3$ -d $ARG4$
}
This was working fine had a power outage last night and after the servers came up I got this error
Any ideas?
Thank you
Tom

Re: Cluster Service Check Fails
Posted: Mon Nov 04, 2019 8:24 am
by scottwilkerson
It thinks 2 of these are in an Unknown state
Code: Select all
$SERVICESTATEID:TGCS014-N1:MSSQLSHAREPOINT$,$SERVICESTATEID:TGCS014-N2:MSSQLSHAREPOINT$
Can you show the current state for the following HOST:SERVICE
Code: Select all
TGCS014-N1:MSSQLSHAREPOINT
TGCS014-N2:MSSQLSHAREPOINT
Re: Cluster Service Check Fails
Posted: Mon Nov 04, 2019 8:48 am
by kwhogster
Scott
That's what I was thinking also but looking at this I am puzzled.
Can you show the current state for the following HOST:SERVICE
How do I do that? cluster power shell command? if so which one
Re: Cluster Service Check Fails
Posted: Mon Nov 04, 2019 9:03 am
by scottwilkerson
In nagios core, search for
MSSQLSHAREPOINT
Show the results
Re: Cluster Service Check Fails
Posted: Mon Nov 04, 2019 11:28 am
by kwhogster
Searching for MSSQLSHAREPOINT in Nagios Core
results 1- 0 of 0 Matching Services
Re: Cluster Service Check Fails
Posted: Mon Nov 04, 2019 11:42 am
by scottwilkerson
kwhogster wrote:Searching for MSSQLSHAREPOINT in Nagios Core
results 1- 0 of 0 Matching Services
Well that is likely the problem.
The cluster check, is checking the status of each of these services, and it looks like they don't exist
Re: Cluster Service Check Fails
Posted: Mon Nov 04, 2019 12:01 pm
by kwhogster
Where should they exist?
Re: Cluster Service Check Fails
Posted: Mon Nov 04, 2019 12:08 pm
by scottwilkerson
kwhogster wrote:Where should they exist?
In your nagios configuration files
Re: Cluster Service Check Fails
Posted: Mon Nov 04, 2019 12:14 pm
by kwhogster
Did you review my service check in my first posting?
That is the only place I have the definition located.
also do you know of a way I can manually check this ?
Re: Cluster Service Check Fails
Posted: Mon Nov 04, 2019 12:38 pm
by scottwilkerson
kwhogster wrote:Did you review my service check in my first posting?
That is the only place I have the definition located.
also do you know of a way I can manually check this ?
Yes I did, I also understand exactly how check_cluster works. It takes the arguments you pass as
$SERVICESTATEID:TGCS014-N1:MSSQLSHAREPOINT$,$SERVICESTATEID:TGCS014-N2:MSSQLSHAREPOINT$ and extrapolates the values...
The macro
$SERVICESTATEID:TGCS014-N1:MSSQLSHAREPOINT$ means to grab the SERVICESTATEID from the service MSSQLSHAREPOINT on the host TGCS014-N1 in nagios. This must be present.
See the On-Demand Macros Section here
https://assets.nagios.com/downloads/nag ... acros.html
Usage for check_cluster
Code: Select all
[root@localhost nagiosxi]# /usr/local/nagios/libexec/check_cluster --help
check_cluster v2.2.1 (nagios-plugins 2.2.1)
Copyright (c) 2000-2004 Ethan Galstad ([email protected])
Copyright (c) 2000-2014 Nagios Plugin Development Team
<[email protected]>
Host/Service Cluster Plugin for Nagios 2
Usage:
check_cluster (-s | -h) -d val1[,val2,...,valn] [-l label]
[-w threshold] [-c threshold] [-v] [--help]
Options:
--extra-opts=[section][@file]
Read options from an ini file. See
https://www.nagios-plugins.org/doc/extra-opts.html
for usage and examples.
-s, --service
Check service cluster status
-h, --host
Check host cluster status
-l, --label=STRING
Optional prepended text output (i.e. "Host cluster")
-w, --warning=THRESHOLD
Specifies the range of hosts or services in cluster that must be in a
non-OK state in order to return a WARNING status level
-c, --critical=THRESHOLD
Specifies the range of hosts or services in cluster that must be in a
non-OK state in order to return a CRITICAL status level
-d, --data=LIST
The status codes of the hosts or services in the cluster, separated by
commas
-v, --verbose
Show details for command-line debugging (Nagios may truncate output)
Notes:
See:
https://www.nagios-plugins.org/doc/guidelines.html#THRESHOLDFORMAT
for THRESHOLD format and examples.
Examples:
check_cluster -s -d 2,0,2,0 -c @3:
Will alert critical if there are 3 or more service data points in a non-OK
state.
Send email to [email protected] if you have questions regarding use
of this software. To submit patches or suggest improvements, send email to
[email protected]