Hello folks,
We are running the latest version of NagiosXI on the pre-built vm and seeing a very strange situation when trying to run an Oracle SQL query monitor against 2 identical database servers and would appreciate some guidance as to why this may be.
1. Connecting to 2 different Oracle servers that have the exactly the same database with bidirectional replication
2. Oracle accounts and user access rights on both databases are identical
3. I can run the same query manually with SqlPlus on both servers under the same username and I get a valid result
4. NagiosXI running the check on one server works properly but on the second server I get a nonsensical response.
Server 1 check command:
check_xi_oraclequery!--connect 'dbserver-1:1537/MYDB' --username 'user' --password 'password' --mode sql --name="select avg(response_time) from myuser.activity_logs where activity_date > sysdate - (2/1440)" --warning 50 --critical 200
Server 1 response when there is no activity for the last 2 minutes:
UNKNOWN - got no valid response for select avg(response_time) from myuser.activity_logs where activity_date > sysdate - (2/1440)
Server 2 check command:
check_xi_oraclequery!--connect 'dbserver-2:1537/MYDB' --username 'user' --password 'password' --mode sql --name="select avg(response_time) from myuser.activity_logs where activity_date > sysdate - (2/1440)" --warning 50 --critical 200
Server 2 response when there is no activity for the last 2 minutes:
UNKNOWN - got no valid response for select avg(response_time) from tim_user.activity_logs where activity_date > sysdate - (2/1440) - ORA-00942: table or view does not exist (DBD ERROR: error possibly near <*> indicator at char 21 in ' SELECT version FROM
Thank you,
Alex
Oracle Query check inconsistency across different servers
-
gsl_ops_practice
- Posts: 151
- Joined: Thu Apr 09, 2015 9:14 pm
Re: Oracle Query check inconsistency across different server
The first thing that comes to my mind here is some sort of networking issue - can you confirm that port 1537 is open for external connection on dbserver2, and that nagios is whitelisted? You can check this with an nmap scan or a telnet to that port:
Can you run the 'check_xi_oraclequery' command from the CLI with no issues?
If you use the IP of the server instead of the domain name, does it work?
Could you verify on server 2 that traffic is reaching it from the nagios server via tcpdump or similar tool?
If the checks are identical and the servers are identical it must be something in the middle.
Best,
Jesse
Code: Select all
telnet dbserver2 1537
nmap dbserver2If you use the IP of the server instead of the domain name, does it work?
Could you verify on server 2 that traffic is reaching it from the nagios server via tcpdump or similar tool?
If the checks are identical and the servers are identical it must be something in the middle.
Best,
Jesse
-
gsl_ops_practice
- Posts: 151
- Joined: Thu Apr 09, 2015 9:14 pm
Re: Oracle Query check inconsistency across different server
Hi Jesse,
These 2 database servers are separated by a couple of oceans, I will have to raise this as a possible network issue, since nothing else comes to mind.
I can telnet to the box no problem
[root@localhost ~]# telnet dbserver-2 1537
Trying NNN.NNN.NNN.NNN...
Connected to dbserver-2.
Escape character is '^]'.
^]
telnet> quit
Connection closed.
but nmap doesn't show my port 1537 as open...
[root@localhost ~]# nmap dbserver-2
Starting Nmap 5.51 ( http://nmap.org ) at 2015-04-16 19:44 UTC
Nmap scan report for dbserver-2 (NNN.NNN.NNN.NNN)
Host is up (0.33s latency).
Not shown: 996 closed ports
PORT STATE SERVICE
22/tcp open ssh
111/tcp open rpcbind
5500/tcp open hotline
5666/tcp open nrpe
I will have to speak to our network team to see if the firewalls on both sites are configured correctly and if there is any dropped packets or errors on this link.
Thanks,
Alex
These 2 database servers are separated by a couple of oceans, I will have to raise this as a possible network issue, since nothing else comes to mind.
I can telnet to the box no problem
[root@localhost ~]# telnet dbserver-2 1537
Trying NNN.NNN.NNN.NNN...
Connected to dbserver-2.
Escape character is '^]'.
^]
telnet> quit
Connection closed.
but nmap doesn't show my port 1537 as open...
[root@localhost ~]# nmap dbserver-2
Starting Nmap 5.51 ( http://nmap.org ) at 2015-04-16 19:44 UTC
Nmap scan report for dbserver-2 (NNN.NNN.NNN.NNN)
Host is up (0.33s latency).
Not shown: 996 closed ports
PORT STATE SERVICE
22/tcp open ssh
111/tcp open rpcbind
5500/tcp open hotline
5666/tcp open nrpe
I will have to speak to our network team to see if the firewalls on both sites are configured correctly and if there is any dropped packets or errors on this link.
Thanks,
Alex
Re: Oracle Query check inconsistency across different server
Sounds good to me Alex, I'll wait for your response.
Best,
Jesse
Best,
Jesse