Cluster Service Check Fails

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
kwhogster
Posts: 644
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA
Contact:

Cluster Service Check Fails

Post by kwhogster »

Nagios Core 4.3.4
Windows 2012 R2 Clustered SQL Server


TGCS014-N1 Cluster SQLS Cluster
CRITICAL 11-01-2019 17:51:28 0d 11h 23m 58s 10/10 CLUSTER CRITICAL: SQL Server (SHAREPOINT): 0 ok, 0 warning, 2 unknown, 0 critical

My check

Code: Select all

define service{
        use                     generic-service
        host_name               TGCS014-N1
        service_description     Cluster SQLS Cluster
        check_interval          10080
        notification_interval   10080
        servicegroups           Clusters
        check_command           check_service_cluster!"SQL Server (SHAREPOINT)"!0!1!$SERVICESTATEID:TGCS014-N1:MSSQLSHAREPOINT$,$SERVICESTATEID:TGCS014-N2:MSSQLSHAREPOINT$
        check_period            backup_WIN12
        notification_period     backup_WIN12
        }

Code: Select all

# 'check service cluster' command definition
define command{
    command_name        check_service_cluster
    command_line        /usr/local/nagios/libexec/check_cluster --service -l $ARG1$ -w $ARG2$ -c $ARG3$ -d $ARG4$
}


This was working fine had a power outage last night and after the servers came up I got this error
Any ideas?

Thank you
Tom :o
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Cluster Service Check Fails

Post by scottwilkerson »

It thinks 2 of these are in an Unknown state

Code: Select all

$SERVICESTATEID:TGCS014-N1:MSSQLSHAREPOINT$,$SERVICESTATEID:TGCS014-N2:MSSQLSHAREPOINT$
Can you show the current state for the following HOST:SERVICE

Code: Select all

TGCS014-N1:MSSQLSHAREPOINT
TGCS014-N2:MSSQLSHAREPOINT
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
kwhogster
Posts: 644
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA
Contact:

Re: Cluster Service Check Fails

Post by kwhogster »

Scott

That's what I was thinking also but looking at this I am puzzled.

Can you show the current state for the following HOST:SERVICE

How do I do that? cluster power shell command? if so which one
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Cluster Service Check Fails

Post by scottwilkerson »

In nagios core, search for
MSSQLSHAREPOINT

Show the results
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
kwhogster
Posts: 644
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA
Contact:

Re: Cluster Service Check Fails

Post by kwhogster »

Searching for MSSQLSHAREPOINT in Nagios Core

results 1- 0 of 0 Matching Services
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Cluster Service Check Fails

Post by scottwilkerson »

kwhogster wrote:Searching for MSSQLSHAREPOINT in Nagios Core

results 1- 0 of 0 Matching Services
Well that is likely the problem.

The cluster check, is checking the status of each of these services, and it looks like they don't exist
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
kwhogster
Posts: 644
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA
Contact:

Re: Cluster Service Check Fails

Post by kwhogster »

Where should they exist?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Cluster Service Check Fails

Post by scottwilkerson »

kwhogster wrote:Where should they exist?
In your nagios configuration files
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
kwhogster
Posts: 644
Joined: Wed Oct 14, 2015 6:51 pm
Location: Wood Ridge NJ USA
Contact:

Re: Cluster Service Check Fails

Post by kwhogster »

Did you review my service check in my first posting?

That is the only place I have the definition located.


also do you know of a way I can manually check this ?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Cluster Service Check Fails

Post by scottwilkerson »

kwhogster wrote:Did you review my service check in my first posting?

That is the only place I have the definition located.


also do you know of a way I can manually check this ?
Yes I did, I also understand exactly how check_cluster works. It takes the arguments you pass as $SERVICESTATEID:TGCS014-N1:MSSQLSHAREPOINT$,$SERVICESTATEID:TGCS014-N2:MSSQLSHAREPOINT$ and extrapolates the values...

The macro $SERVICESTATEID:TGCS014-N1:MSSQLSHAREPOINT$ means to grab the SERVICESTATEID from the service MSSQLSHAREPOINT on the host TGCS014-N1 in nagios. This must be present.

See the On-Demand Macros Section here
https://assets.nagios.com/downloads/nag ... acros.html


Usage for check_cluster

Code: Select all

[root@localhost nagiosxi]# /usr/local/nagios/libexec/check_cluster --help
check_cluster v2.2.1 (nagios-plugins 2.2.1)
Copyright (c) 2000-2004 Ethan Galstad (nagios@nagios.org)
Copyright (c) 2000-2014 Nagios Plugin Development Team
        <devel@nagios-plugins.org>

Host/Service Cluster Plugin for Nagios 2

Usage:
 check_cluster (-s | -h) -d val1[,val2,...,valn] [-l label]
[-w threshold] [-c threshold] [-v] [--help]

Options:
 --extra-opts=[section][@file]
    Read options from an ini file. See
    https://www.nagios-plugins.org/doc/extra-opts.html
    for usage and examples.
 -s, --service
    Check service cluster status
 -h, --host
    Check host cluster status
 -l, --label=STRING
    Optional prepended text output (i.e. "Host cluster")
 -w, --warning=THRESHOLD
    Specifies the range of hosts or services in cluster that must be in a
    non-OK state in order to return a WARNING status level
 -c, --critical=THRESHOLD
    Specifies the range of hosts or services in cluster that must be in a
    non-OK state in order to return a CRITICAL status level
 -d, --data=LIST
    The status codes of the hosts or services in the cluster, separated by
    commas
 -v, --verbose
    Show details for command-line debugging (Nagios may truncate output)

Notes:
 See:
 https://www.nagios-plugins.org/doc/guidelines.html#THRESHOLDFORMAT
 for THRESHOLD format and examples.

Examples:
 check_cluster -s -d 2,0,2,0 -c @3:
    Will alert critical if there are 3 or more service data points in a non-OK
    state.

Send email to help@nagios-plugins.org if you have questions regarding use
of this software. To submit patches or suggest improvements, send email to
devel@nagios-plugins.org
Former Nagios employee
Creator:
ahumandesign.com
enneagrams.com
Locked