Page 1 of 2

Cluster Service Check Fails

Posted: Fri Nov 01, 2019 6:09 pm
by kwhogster
Nagios Core 4.3.4
Windows 2012 R2 Clustered SQL Server


TGCS014-N1 Cluster SQLS Cluster
CRITICAL 11-01-2019 17:51:28 0d 11h 23m 58s 10/10 CLUSTER CRITICAL: SQL Server (SHAREPOINT): 0 ok, 0 warning, 2 unknown, 0 critical

My check

Code: Select all

define service{
        use                     generic-service
        host_name               TGCS014-N1
        service_description     Cluster SQLS Cluster
        check_interval          10080
        notification_interval   10080
        servicegroups           Clusters
        check_command           check_service_cluster!"SQL Server (SHAREPOINT)"!0!1!$SERVICESTATEID:TGCS014-N1:MSSQLSHAREPOINT$,$SERVICESTATEID:TGCS014-N2:MSSQLSHAREPOINT$
        check_period            backup_WIN12
        notification_period     backup_WIN12
        }

Code: Select all

# 'check service cluster' command definition
define command{
    command_name        check_service_cluster
    command_line        /usr/local/nagios/libexec/check_cluster --service -l $ARG1$ -w $ARG2$ -c $ARG3$ -d $ARG4$
}


This was working fine had a power outage last night and after the servers came up I got this error
Any ideas?

Thank you
Tom :o

Re: Cluster Service Check Fails

Posted: Mon Nov 04, 2019 8:24 am
by scottwilkerson
It thinks 2 of these are in an Unknown state

Code: Select all

$SERVICESTATEID:TGCS014-N1:MSSQLSHAREPOINT$,$SERVICESTATEID:TGCS014-N2:MSSQLSHAREPOINT$
Can you show the current state for the following HOST:SERVICE

Code: Select all

TGCS014-N1:MSSQLSHAREPOINT
TGCS014-N2:MSSQLSHAREPOINT

Re: Cluster Service Check Fails

Posted: Mon Nov 04, 2019 8:48 am
by kwhogster
Scott

That's what I was thinking also but looking at this I am puzzled.

Can you show the current state for the following HOST:SERVICE

How do I do that? cluster power shell command? if so which one

Re: Cluster Service Check Fails

Posted: Mon Nov 04, 2019 9:03 am
by scottwilkerson
In nagios core, search for
MSSQLSHAREPOINT

Show the results

Re: Cluster Service Check Fails

Posted: Mon Nov 04, 2019 11:28 am
by kwhogster
Searching for MSSQLSHAREPOINT in Nagios Core

results 1- 0 of 0 Matching Services

Re: Cluster Service Check Fails

Posted: Mon Nov 04, 2019 11:42 am
by scottwilkerson
kwhogster wrote:Searching for MSSQLSHAREPOINT in Nagios Core

results 1- 0 of 0 Matching Services
Well that is likely the problem.

The cluster check, is checking the status of each of these services, and it looks like they don't exist

Re: Cluster Service Check Fails

Posted: Mon Nov 04, 2019 12:01 pm
by kwhogster
Where should they exist?

Re: Cluster Service Check Fails

Posted: Mon Nov 04, 2019 12:08 pm
by scottwilkerson
kwhogster wrote:Where should they exist?
In your nagios configuration files

Re: Cluster Service Check Fails

Posted: Mon Nov 04, 2019 12:14 pm
by kwhogster
Did you review my service check in my first posting?

That is the only place I have the definition located.


also do you know of a way I can manually check this ?

Re: Cluster Service Check Fails

Posted: Mon Nov 04, 2019 12:38 pm
by scottwilkerson
kwhogster wrote:Did you review my service check in my first posting?

That is the only place I have the definition located.


also do you know of a way I can manually check this ?
Yes I did, I also understand exactly how check_cluster works. It takes the arguments you pass as $SERVICESTATEID:TGCS014-N1:MSSQLSHAREPOINT$,$SERVICESTATEID:TGCS014-N2:MSSQLSHAREPOINT$ and extrapolates the values...

The macro $SERVICESTATEID:TGCS014-N1:MSSQLSHAREPOINT$ means to grab the SERVICESTATEID from the service MSSQLSHAREPOINT on the host TGCS014-N1 in nagios. This must be present.

See the On-Demand Macros Section here
https://assets.nagios.com/downloads/nag ... acros.html


Usage for check_cluster

Code: Select all

[root@localhost nagiosxi]# /usr/local/nagios/libexec/check_cluster --help
check_cluster v2.2.1 (nagios-plugins 2.2.1)
Copyright (c) 2000-2004 Ethan Galstad ([email protected])
Copyright (c) 2000-2014 Nagios Plugin Development Team
        <[email protected]>

Host/Service Cluster Plugin for Nagios 2

Usage:
 check_cluster (-s | -h) -d val1[,val2,...,valn] [-l label]
[-w threshold] [-c threshold] [-v] [--help]

Options:
 --extra-opts=[section][@file]
    Read options from an ini file. See
    https://www.nagios-plugins.org/doc/extra-opts.html
    for usage and examples.
 -s, --service
    Check service cluster status
 -h, --host
    Check host cluster status
 -l, --label=STRING
    Optional prepended text output (i.e. "Host cluster")
 -w, --warning=THRESHOLD
    Specifies the range of hosts or services in cluster that must be in a
    non-OK state in order to return a WARNING status level
 -c, --critical=THRESHOLD
    Specifies the range of hosts or services in cluster that must be in a
    non-OK state in order to return a CRITICAL status level
 -d, --data=LIST
    The status codes of the hosts or services in the cluster, separated by
    commas
 -v, --verbose
    Show details for command-line debugging (Nagios may truncate output)

Notes:
 See:
 https://www.nagios-plugins.org/doc/guidelines.html#THRESHOLDFORMAT
 for THRESHOLD format and examples.

Examples:
 check_cluster -s -d 2,0,2,0 -c @3:
    Will alert critical if there are 3 or more service data points in a non-OK
    state.

Send email to [email protected] if you have questions regarding use
of this software. To submit patches or suggest improvements, send email to
[email protected]