Hi All,
Environment Details:
Nagios XI Server: 2026R1.4
Target SQL Servers: Windows SQL Server (Mixed Mode enabled)
Authentication: Windows Service Account (using service account)
Port: 4070 (Custom)
I am experiencing a sudden monitoring failure on three specific SQL servers. These servers were working correctly until I configured a Datadog Availability Group failover monitor using the MSOLEDBSQL driver and Trusted_Connection=yes. While Datadog is working fine, Nagios check plugins (check_sql and check_mssql_health) now fail with authentication errors.
I am trying to check below services using Nagios
MSSQL AG Sync Status Check
usr/local/nagios/libexec/check_sql -H 127.0.0.1 -d Sybase -D master -p 11.177.10.654 -U "BT\206894543" -P "PASSWORD" -q "SELECT CAST(COUNT(*) AS VARCHAR) + ' AG NAME='+ag.name+' DATABASE:'+adc.database_name+' AG STATUS='+drs.synchronization_health_desc FROM sys.dm_hadr_database_replica_states AS drs INNER JOIN sys.availability_databases_cluster AS adc ON drs.group_id = adc.group_id AND drs.group_database_id = adc.group_database_id INNER JOIN sys.availability_groups AS ag ON ag.group_id = drs.group_id INNER JOIN sys.availability_replicas AS ar ON drs.group_id = ar.group_id AND drs.replica_id = ar.replica_id where drs.synchronization_health_desc <> 'HEALTHY' and is_local=1 group by adc.database_name,drs.synchronization_health_desc,ag.name" -C 0 -s
MSSQL LongTransaction
/usr/local/nagios/libexec/check_sql -H 11.177.10.65 -d "Sybase" -D "master" -p 4070 -U "BT\206894543" -P "PASSWORD" -q "SELECT COUNT(*) FROM sys.dm_tran_database_transactions AS s_tdt JOIN sys.dm_tran_session_transactions AS s_tst ON s_tst.transaction_id = s_tdt.transaction_id JOIN sys.databases AS s_db ON s_tdt.database_id = s_db.database_id JOIN sys.dm_exec_sessions AS s_es ON s_tst.session_id = s_es.session_id JOIN sys.dm_exec_requests AS s_er ON s_es.session_id = s_er.session_id CROSS APPLY sys.dm_exec_sql_text(s_er.sql_handle) AS s_sql WHERE s_tdt.database_transaction_begin_time IS NOT NULL AND s_db.name NOT IN ('DBA_ADMIN', 'msdb', 'master') AND s_es.status = 'running' AND DATEDIFF(SECOND, s_tdt.database_transaction_begin_time, GETDATE()) > 600" -C 0 -s
MSSQL SuspectDBCnt
/usr/local/nagios/libexec/check_sql -H 1.177.10.654 -d Sybase -D master -p 4070 -U "BT\206894543" -P "PASSWORD" -q "SELECT COUNT(*) FROM sys.databases where state_desc = 'SUSPECT'" -C 0 -s
MSSQL DB Connect
/usr/local/nagios/libexec/check_mssql_health --hostname 11.177.10.654 --mode database-online --port 4070 --username "BT\206894543" --password "PASSWORD" --warning 0 --critical 0 --commit --notemp
/usr/local/nagios/libexec/check_mssql_health --hostname 11.177.10.654 --mode database-online --port 4070 --username "BT\206894543" --password "PASSWORD" --warning 0 --critical 0 --commit --notemp
MSSQL DB Deadlocks Rate
/usr/local/nagios/libexec/check_mssql_health --hostname 11.177.10.654 --statefilesdir=/tmp/check_mssql_health --mode locks-deadlocks --port 4070 --username "BT\206894543" --password "PASSWORD" --warning 0 --critical 1 --commit --notemp
Below are the errors I am getting while using these plugins in Nagios GUI
While using check_mssql_health:
CRITICAL - DBI connect(':host=11.177.10.654:port=4070:encryptPassword=1','BT\206894543',...) failed: OpenClient message: LAYER = (0) ORIGIN = (0) SEVERITY = (78) NUMBER = (44)
Server , database
Message String: Server name not found in configuration files.
Server message number=18452 severity=14 state=1 line=1 server=ICLDBWV00212\\V0345EC04 text=Login failed. The login is from an untrusted domain and cannot be used with Integrated authentication. OpenClient message: LAYER = (0) ORIGIN = (0) SEVERITY = (78) NUMBER = (34)
Server , database
Message String: Adaptive Server connection failed
OpenClient message: LAYER = (0) ORIGIN = (0) SEVERITY = (78) NUMBER = (34)
Server , database
Message String: Adaptive Server connection failed
at /usr/local/nagios/libexec/check_mssql_health line 6929.
While using check_sql:
CHECK_SQL UNKNOWN - Login failed
Troubleshooting Already Performed:
Authentication Mode: Verified that SQL Server is still set to Mixed Mode.
Plugin Arguments: Tested escaping the backslash (BWT3\\206894628) and using double/single quotes for credentials.
Removing Datadog: Verified uninstalling Datadog from these servers and checking from CLI
Command Execution: Tested using both --hostname <IP> and --server <freetds_name> with the same result.
It will be very much helpful if anyone can suggest the steps towards resoultion.
NOTE: I have facing this isse only with 3 SQL servers while the other 20-30 SQL server service checks are in OK state
Thanks in advance
Facing Issues with chcek_sql and check_mssql_health
-
DileepKumar
- Posts: 49
- Joined: Thu May 22, 2025 10:43 am
Re: Facing Issues with chcek_sql and check_mssql_health
Hi @DileepKumar ,
Is it possible you have a typo in your hostname? I see many references to 11.177.10.654 and one to 11.177.10.65.
Is it possible you have a typo in your hostname? I see many references to 11.177.10.654 and one to 11.177.10.65.
Cheers,
- Cole
- Cole
-
DileepKumar
- Posts: 49
- Joined: Thu May 22, 2025 10:43 am
Re: Facing Issues with chcek_sql and check_mssql_health
yes @cdietsch, sorry my mistake. Please consider the hostname as the first one.
Re: Facing Issues with chcek_sql and check_mssql_health
Hey @DileepKumar ,
We have some steps that should help you troubleshoot your issue.
Let’s carefully break down your check_mssql_health error and figure out the root cause and how to fix it. This sounds like a classic case of a connection/authentication mismatch with Microsoft SQL Server.
1. Parsing the Error
Here’s what your error messages are telling us:
1. Server name not found in configuration files
This usually happens when the SQL client library (FreeTDS, in this case, probably used by check_mssql_health) cannot resolve the server name you provided.
The host 11.177.10.654 is not a valid IP address; IP octets must be between 0–255. So that alone will prevent connection.
2. Login failed. The login is from an untrusted domain and cannot be used with Integrated authentication
This tells us that you are trying to use Windows Integrated Authentication (Kerberos/SSPI) for a login that either isn’t recognized or is coming from a machine that isn’t trusted by the domain.
If your Nagios server is not part of the domain, Integrated Authentication won’t work.
3. Adaptive Server connection failed
This is a generic failure from FreeTDS when the client cannot connect to the server at all.
2. Common Causes
Step A: Validate your connection details
Step C: Test FreeTDS manually
host = 11.177.10.65
port = 4070
tds version = 7.4
Then test connection:
tsql -S ICLDBWV00212 -U your_sql_user -P 'YourPassword'
Step D: Verify Nagios plugin configuration
We have some steps that should help you troubleshoot your issue.
Let’s carefully break down your check_mssql_health error and figure out the root cause and how to fix it. This sounds like a classic case of a connection/authentication mismatch with Microsoft SQL Server.
1. Parsing the Error
Here’s what your error messages are telling us:
1. Server name not found in configuration files
This usually happens when the SQL client library (FreeTDS, in this case, probably used by check_mssql_health) cannot resolve the server name you provided.
The host 11.177.10.654 is not a valid IP address; IP octets must be between 0–255. So that alone will prevent connection.
2. Login failed. The login is from an untrusted domain and cannot be used with Integrated authentication
This tells us that you are trying to use Windows Integrated Authentication (Kerberos/SSPI) for a login that either isn’t recognized or is coming from a machine that isn’t trusted by the domain.
If your Nagios server is not part of the domain, Integrated Authentication won’t work.
3. Adaptive Server connection failed
This is a generic failure from FreeTDS when the client cannot connect to the server at all.
2. Common Causes
- Incorrect IP address or hostname.
- Using Windows authentication when the server requires SQL authentication.
- Missing or misconfigured FreeTDS freetds.conf or ODBC DSN.
- The port might be wrong, or the SQL Server instance name may be required.
Step A: Validate your connection details
- Make sure the server IP is valid. For example, 11.177.10.654 is invalid. A valid example: 11.177.10.65.
- Ensure you’re using the correct port (default for MSSQL: 1433 unless a named instance is configured).
- Your command seems to be using Integrated Authentication (:encryptPassword=1 often implies NTLM/Kerberos).
- Switch to a SQL login
Step C: Test FreeTDS manually
- Install FreeTDS if not already: sudo yum install freetds or apt install freetds-bin
- Create or edit /etc/freetds/freetds.conf:
host = 11.177.10.65
port = 4070
tds version = 7.4
Then test connection:
tsql -S ICLDBWV00212 -U your_sql_user -P 'YourPassword'
Step D: Verify Nagios plugin configuration
- Use the correct DSN or direct host.
- Ensure the Nagios user can read any .odbc.ini or .freetds.conf files if using DSN.
- If your Nagios server is not on the domain, don’t use -E (Integrated Authentication) or :encryptPassword=1.
- Use SQL login credentials.
- Fix the invalid IP.
- Use SQL authentication, not Integrated Auth.
- Configure FreeTDS properly or use a direct DSN.
- Test manually with tsql to confirm credentials and connectivity before letting Nagios check it.
Cheers,
- Cole
- Cole