Facing Issues with chcek_sql and check_mssql_health
Posted: Fri May 08, 2026 3:09 am
Hi All,
Environment Details:
Nagios XI Server: 2026R1.4
Target SQL Servers: Windows SQL Server (Mixed Mode enabled)
Authentication: Windows Service Account (using service account)
Port: 4070 (Custom)
I am experiencing a sudden monitoring failure on three specific SQL servers. These servers were working correctly until I configured a Datadog Availability Group failover monitor using the MSOLEDBSQL driver and Trusted_Connection=yes. While Datadog is working fine, Nagios check plugins (check_sql and check_mssql_health) now fail with authentication errors.
I am trying to check below services using Nagios
MSSQL AG Sync Status Check
usr/local/nagios/libexec/check_sql -H 127.0.0.1 -d Sybase -D master -p 11.177.10.654 -U "BT\206894543" -P "PASSWORD" -q "SELECT CAST(COUNT(*) AS VARCHAR) + ' AG NAME='+ag.name+' DATABASE:'+adc.database_name+' AG STATUS='+drs.synchronization_health_desc FROM sys.dm_hadr_database_replica_states AS drs INNER JOIN sys.availability_databases_cluster AS adc ON drs.group_id = adc.group_id AND drs.group_database_id = adc.group_database_id INNER JOIN sys.availability_groups AS ag ON ag.group_id = drs.group_id INNER JOIN sys.availability_replicas AS ar ON drs.group_id = ar.group_id AND drs.replica_id = ar.replica_id where drs.synchronization_health_desc <> 'HEALTHY' and is_local=1 group by adc.database_name,drs.synchronization_health_desc,ag.name" -C 0 -s
MSSQL LongTransaction
/usr/local/nagios/libexec/check_sql -H 11.177.10.65 -d "Sybase" -D "master" -p 4070 -U "BT\206894543" -P "PASSWORD" -q "SELECT COUNT(*) FROM sys.dm_tran_database_transactions AS s_tdt JOIN sys.dm_tran_session_transactions AS s_tst ON s_tst.transaction_id = s_tdt.transaction_id JOIN sys.databases AS s_db ON s_tdt.database_id = s_db.database_id JOIN sys.dm_exec_sessions AS s_es ON s_tst.session_id = s_es.session_id JOIN sys.dm_exec_requests AS s_er ON s_es.session_id = s_er.session_id CROSS APPLY sys.dm_exec_sql_text(s_er.sql_handle) AS s_sql WHERE s_tdt.database_transaction_begin_time IS NOT NULL AND s_db.name NOT IN ('DBA_ADMIN', 'msdb', 'master') AND s_es.status = 'running' AND DATEDIFF(SECOND, s_tdt.database_transaction_begin_time, GETDATE()) > 600" -C 0 -s
MSSQL SuspectDBCnt
/usr/local/nagios/libexec/check_sql -H 1.177.10.654 -d Sybase -D master -p 4070 -U "BT\206894543" -P "PASSWORD" -q "SELECT COUNT(*) FROM sys.databases where state_desc = 'SUSPECT'" -C 0 -s
MSSQL DB Connect
/usr/local/nagios/libexec/check_mssql_health --hostname 11.177.10.654 --mode database-online --port 4070 --username "BT\206894543" --password "PASSWORD" --warning 0 --critical 0 --commit --notemp
/usr/local/nagios/libexec/check_mssql_health --hostname 11.177.10.654 --mode database-online --port 4070 --username "BT\206894543" --password "PASSWORD" --warning 0 --critical 0 --commit --notemp
MSSQL DB Deadlocks Rate
/usr/local/nagios/libexec/check_mssql_health --hostname 11.177.10.654 --statefilesdir=/tmp/check_mssql_health --mode locks-deadlocks --port 4070 --username "BT\206894543" --password "PASSWORD" --warning 0 --critical 1 --commit --notemp
Below are the errors I am getting while using these plugins in Nagios GUI
While using check_mssql_health:
CRITICAL - DBI connect(':host=11.177.10.654:port=4070:encryptPassword=1','BT\206894543',...) failed: OpenClient message: LAYER = (0) ORIGIN = (0) SEVERITY = (78) NUMBER = (44)
Server , database
Message String: Server name not found in configuration files.
Server message number=18452 severity=14 state=1 line=1 server=ICLDBWV00212\\V0345EC04 text=Login failed. The login is from an untrusted domain and cannot be used with Integrated authentication. OpenClient message: LAYER = (0) ORIGIN = (0) SEVERITY = (78) NUMBER = (34)
Server , database
Message String: Adaptive Server connection failed
OpenClient message: LAYER = (0) ORIGIN = (0) SEVERITY = (78) NUMBER = (34)
Server , database
Message String: Adaptive Server connection failed
at /usr/local/nagios/libexec/check_mssql_health line 6929.
While using check_sql:
CHECK_SQL UNKNOWN - Login failed
Troubleshooting Already Performed:
Authentication Mode: Verified that SQL Server is still set to Mixed Mode.
Plugin Arguments: Tested escaping the backslash (BWT3\\206894628) and using double/single quotes for credentials.
Removing Datadog: Verified uninstalling Datadog from these servers and checking from CLI
Command Execution: Tested using both --hostname <IP> and --server <freetds_name> with the same result.
It will be very much helpful if anyone can suggest the steps towards resoultion.
NOTE: I have facing this isse only with 3 SQL servers while the other 20-30 SQL server service checks are in OK state
Thanks in advance
Environment Details:
Nagios XI Server: 2026R1.4
Target SQL Servers: Windows SQL Server (Mixed Mode enabled)
Authentication: Windows Service Account (using service account)
Port: 4070 (Custom)
I am experiencing a sudden monitoring failure on three specific SQL servers. These servers were working correctly until I configured a Datadog Availability Group failover monitor using the MSOLEDBSQL driver and Trusted_Connection=yes. While Datadog is working fine, Nagios check plugins (check_sql and check_mssql_health) now fail with authentication errors.
I am trying to check below services using Nagios
MSSQL AG Sync Status Check
usr/local/nagios/libexec/check_sql -H 127.0.0.1 -d Sybase -D master -p 11.177.10.654 -U "BT\206894543" -P "PASSWORD" -q "SELECT CAST(COUNT(*) AS VARCHAR) + ' AG NAME='+ag.name+' DATABASE:'+adc.database_name+' AG STATUS='+drs.synchronization_health_desc FROM sys.dm_hadr_database_replica_states AS drs INNER JOIN sys.availability_databases_cluster AS adc ON drs.group_id = adc.group_id AND drs.group_database_id = adc.group_database_id INNER JOIN sys.availability_groups AS ag ON ag.group_id = drs.group_id INNER JOIN sys.availability_replicas AS ar ON drs.group_id = ar.group_id AND drs.replica_id = ar.replica_id where drs.synchronization_health_desc <> 'HEALTHY' and is_local=1 group by adc.database_name,drs.synchronization_health_desc,ag.name" -C 0 -s
MSSQL LongTransaction
/usr/local/nagios/libexec/check_sql -H 11.177.10.65 -d "Sybase" -D "master" -p 4070 -U "BT\206894543" -P "PASSWORD" -q "SELECT COUNT(*) FROM sys.dm_tran_database_transactions AS s_tdt JOIN sys.dm_tran_session_transactions AS s_tst ON s_tst.transaction_id = s_tdt.transaction_id JOIN sys.databases AS s_db ON s_tdt.database_id = s_db.database_id JOIN sys.dm_exec_sessions AS s_es ON s_tst.session_id = s_es.session_id JOIN sys.dm_exec_requests AS s_er ON s_es.session_id = s_er.session_id CROSS APPLY sys.dm_exec_sql_text(s_er.sql_handle) AS s_sql WHERE s_tdt.database_transaction_begin_time IS NOT NULL AND s_db.name NOT IN ('DBA_ADMIN', 'msdb', 'master') AND s_es.status = 'running' AND DATEDIFF(SECOND, s_tdt.database_transaction_begin_time, GETDATE()) > 600" -C 0 -s
MSSQL SuspectDBCnt
/usr/local/nagios/libexec/check_sql -H 1.177.10.654 -d Sybase -D master -p 4070 -U "BT\206894543" -P "PASSWORD" -q "SELECT COUNT(*) FROM sys.databases where state_desc = 'SUSPECT'" -C 0 -s
MSSQL DB Connect
/usr/local/nagios/libexec/check_mssql_health --hostname 11.177.10.654 --mode database-online --port 4070 --username "BT\206894543" --password "PASSWORD" --warning 0 --critical 0 --commit --notemp
/usr/local/nagios/libexec/check_mssql_health --hostname 11.177.10.654 --mode database-online --port 4070 --username "BT\206894543" --password "PASSWORD" --warning 0 --critical 0 --commit --notemp
MSSQL DB Deadlocks Rate
/usr/local/nagios/libexec/check_mssql_health --hostname 11.177.10.654 --statefilesdir=/tmp/check_mssql_health --mode locks-deadlocks --port 4070 --username "BT\206894543" --password "PASSWORD" --warning 0 --critical 1 --commit --notemp
Below are the errors I am getting while using these plugins in Nagios GUI
While using check_mssql_health:
CRITICAL - DBI connect(':host=11.177.10.654:port=4070:encryptPassword=1','BT\206894543',...) failed: OpenClient message: LAYER = (0) ORIGIN = (0) SEVERITY = (78) NUMBER = (44)
Server , database
Message String: Server name not found in configuration files.
Server message number=18452 severity=14 state=1 line=1 server=ICLDBWV00212\\V0345EC04 text=Login failed. The login is from an untrusted domain and cannot be used with Integrated authentication. OpenClient message: LAYER = (0) ORIGIN = (0) SEVERITY = (78) NUMBER = (34)
Server , database
Message String: Adaptive Server connection failed
OpenClient message: LAYER = (0) ORIGIN = (0) SEVERITY = (78) NUMBER = (34)
Server , database
Message String: Adaptive Server connection failed
at /usr/local/nagios/libexec/check_mssql_health line 6929.
While using check_sql:
CHECK_SQL UNKNOWN - Login failed
Troubleshooting Already Performed:
Authentication Mode: Verified that SQL Server is still set to Mixed Mode.
Plugin Arguments: Tested escaping the backslash (BWT3\\206894628) and using double/single quotes for credentials.
Removing Datadog: Verified uninstalling Datadog from these servers and checking from CLI
Command Execution: Tested using both --hostname <IP> and --server <freetds_name> with the same result.
It will be very much helpful if anyone can suggest the steps towards resoultion.
NOTE: I have facing this isse only with 3 SQL servers while the other 20-30 SQL server service checks are in OK state
Thanks in advance