Nagios Support Forum

Posted: **Mon Nov 27, 2017 4:44 pm**

Afternoon everyone,

I am trying to troubleshoot an issue where on one of our MSSQL Clusters, when I query the SQL instance for checks using check_mssql_server.py, I get the following error:

Code: Select all

<type 'exceptions.IndexError'>
Caught unexpected error. This could be caused by your sysperfinfo not containing the proper entries for this query, and you may delete this service check.

What makes this especially strange is that it works on one cluster, but not on another. To that effect, I've been trying to decipher what's different about them to see why it works on one but not the other. Both servers are SQL Server 2012, and both are using a local "nagios" SQL user that has the same database-wide permissions. Both of the instances are named MSSQLSERVER, and the "MSSQL Connection Time" check does work. All of the others create the error above.

One of the first things I thought to try based on the messages above was to look at nmap and see what the accessible ports were for each server. Here's the results:

NOT WORKING SERVER

Code: Select all

Starting Nmap 6.25 ( http://nmap.org ) at 2017-11-27 14:24 EST
Nmap scan report for (not working server) (10.100.96.14)
Host is up (0.00043s latency).
Not shown: 986 closed ports
PORT      STATE SERVICE
80/tcp    open  http
135/tcp   open  msrpc
139/tcp   open  netbios-ssn
445/tcp   open  microsoft-ds
1433/tcp  open  ms-sql-s
1556/tcp  open  veritas_pbx
2383/tcp  open  ms-olap4
2701/tcp  open  sms-rcinfo
3389/tcp  open  ms-wbt-server
5666/tcp  open  nrpe
13782/tcp open  netbackup
49152/tcp open  unknown
49153/tcp open  unknown
49154/tcp open  unknown

Nmap done: 1 IP address (1 host up) scanned in 2.48 seconds

WORKING SERVER

Code: Select all

Starting Nmap 6.25 ( http://nmap.org ) at 2017-11-27 16:04 EST
Nmap scan report for (working server) (10.100.8.34)
Host is up (0.00018s latency).
rDNS record for 10.100.8.34: (hidden)
Not shown: 978 closed ports
PORT      STATE SERVICE
80/tcp    open  http
111/tcp   open  rpcbind
135/tcp   open  msrpc
139/tcp   open  netbios-ssn
445/tcp   open  microsoft-ds
1023/tcp  open  netvenuechat
1433/tcp  open  ms-sql-s
1556/tcp  open  veritas_pbx
2301/tcp  open  compaqdiag
2381/tcp  open  compaq-https
2383/tcp  open  ms-olap4
2701/tcp  open  sms-rcinfo
3389/tcp  open  ms-wbt-server
5666/tcp  open  nrpe
6129/tcp  open  unknown
8009/tcp  open  ajp13
8080/tcp  open  http-proxy
8443/tcp  open  https-alt
13782/tcp open  netbackup
49152/tcp open  unknown
49153/tcp open  unknown
49154/tcp open  unknown

Nmap done: 1 IP address (1 host up) scanned in 1.57 seconds

Things I notice based on this:

* The working server has an rDNS entry ... the non-working server doesn't.
* The working server has more open ports, but the port I would expect that we need for this to work, 1433, is open on both servers, so I don't think the differences in open ports matters a whole lot here.

Would love everyone's thought as to what I might be missing here. Thanks!

Posted: **Tue Nov 28, 2017 12:28 pm**

Are you using the same password on both MSSQL servers? Is it possible that you have "special" characters in the password on the "not working server"?

Can you show us the actual check, run from the command line along with the output of it? Remove sensitive info (IPs, passwords, etc.).

Posted: **Wed Dec 13, 2017 2:31 pm**

Verified that the password does not have any special characters that could cause an issue. When running from the command line of the Nagios server (as the nagios user):

Code: Select all

-bash-4.1$ ./check_mssql_server.py -H <OUR HOST> -U 'nagios' -P <OUR PASSWORD> -I 'MSSQLSERVER' --locktimeouts --warning 200 --critical 300
<type 'exceptions.TypeError'>
Caught unexpected error. This could be caused by your sysperfinfo not containing the proper entries for this query, and you may delete this service check.

Posted: **Thu Dec 14, 2017 12:50 pm**

Have you tried adding the port -p 1433?

Does that make any difference?

Code: Select all

./check_mssql_server.py -H <OUR HOST> -U 'nagios' -P <OUR PASSWORD> -I 'MSSQLSERVER' -p 1433 --locktimeouts --warning 200 --critical 300

If not, is it the same output?

Code: Select all

./check_mssql_server.py -H <OUR HOST> -U 'nagios' -P <OUR PASSWORD> -p 1433

Try taking out the instance, are you able to make a connection?

Nagios Support Forum

check_mssql_server.py type 'exceptions.IndexError'

check_mssql_server.py type 'exceptions.IndexError'

Re: check_mssql_server.py type 'exceptions.IndexError'

Re: check_mssql_server.py type 'exceptions.IndexError'

Re: check_mssql_server.py type 'exceptions.IndexError'