check_oracle_health - call filename

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
Chite
Posts: 23
Joined: Tue Jan 26, 2016 5:32 pm

check_oracle_health - call filename

Post by Chite »

how do you pass a file with a sql query to oracle using the check_oracle_health command? ./check_oracle_health --connect <SID> --username '<username>' --password '<password>' --mode sql --name <filename>.sql

this command were using doesn't seem to be looking at the sql query in the file.

thanks
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: check_oracle_health - call filename

Post by mcapra »

Based on the usage instructions for this plugin, I don't think --name accepts a SQL file as a parameter:

Code: Select all

-–name
Here the check can be limited to a single object (Latch, Enqueue, Tablespace, Datafile). If this parameter is omitted all objects are checked. (Instead of –tablespace or –datafile this parameter can and should be used. It servers the purpose to standardize the CLI interface.)
I do not see any parameters that allow you to provide a sql file to this plugin.
Former Nagios employee
https://www.mcapra.com/
Chite
Posts: 23
Joined: Tue Jan 26, 2016 5:32 pm

Re: check_oracle_health - call filename

Post by Chite »

maybe there's a better way or better command/module to do what I'm trying to do.... I'll explain my situation:

We have an Oracle database that we'd like to run a query against (to find any table blocking or locking). it seems the sql that I've received from my DBA doesn't work in nagiosxi. I can run the code from a sql plus command line and it works and returns a 0 or >1 value, but when I run it in nagios xi it fails.

here's the error in nagios:
(No output on stdout) stderr: /bin/sh: -c: line 0: unexpected EOF while looking for matching `"'
/bin/sh: -c: line 1: syntax error: unexpected end of file

here's the query were using:
check_xi_oraclequery!--connect '<servername:port#>/<instance name>' --username '<username>' --password '<password>' --mode sql --name "select count(1) FROM sys.v_$process p1 , sys.v_$process p2 , sys.v_$session s1 , sys.v_$session s2 , dba_locks w , dba_locks h WHERE h.mode_held != 'None' AND h.mode_held != 'Null' AND w.mode_requested != 'None' AND w.lock_type (+) = h.lock_type AND w.lock_id1 (+) = h.lock_id1 AND w.lock_id2 (+) = h.lock_id2 AND w.session_id = s1.sid (+) AND h.session_id = s2.sid (+) AND s1.paddr = p1.addr (+) AND s2.paddr = p2.addr (+)" --critical @1!!!!!!!

maybe there's a better way to pull locked files. I'm not much of a DBA so I have to trust what my DBA is giving me is correct.

any thoughts would be appreciated.
thanks!
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: check_oracle_health - call filename

Post by mcapra »

I notice there's a few $ characters in this query. You may need to escape them so that the CLI isn't trying to parse them as system/script variables. Like so using backslashes:

Code: Select all

check_xi_oraclequery!--connect '<servername:port#>/<instance name>' --username '<username>' --password '<password>' --mode sql --name "select count(1) FROM sys.v_\$process p1 , sys.v_\$process p2 , sys.v_\$session s1 , sys.v_\$session s2 , dba_locks w , dba_locks h WHERE h.mode_held != 'None' AND h.mode_held != 'Null' AND w.mode_requested != 'None' AND w.lock_type (+) = h.lock_type AND w.lock_id1 (+) = h.lock_id1 AND w.lock_id2 (+) = h.lock_id2 AND w.session_id = s1.sid (+) AND h.session_id = s2.sid (+) AND s1.paddr = p1.addr (+) AND s2.paddr = p2.addr (+)" --critical @1!!!!!!!
Can you also try running the command directly from the CLI and share it's output? It might look something like this:

Code: Select all

/usr/local/nagios/libexec/check_oracle_health --connect '<servername:port#>/<instance name>' --username '<username>' --password '<password>' --mode sql --name "select count(1) FROM sys.v_\$process p1 , sys.v_\$process p2 , sys.v_\$session s1 , sys.v_\$session s2 , dba_locks w , dba_locks h WHERE h.mode_held != 'None' AND h.mode_held != 'Null' AND w.mode_requested != 'None' AND w.lock_type (+) = h.lock_type AND w.lock_id1 (+) = h.lock_id1 AND w.lock_id2 (+) = h.lock_id2 AND w.session_id = s1.sid (+) AND h.session_id = s2.sid (+) AND s1.paddr = p1.addr (+) AND s2.paddr = p2.addr (+)" --critical @1
Former Nagios employee
https://www.mcapra.com/
Chite
Posts: 23
Joined: Tue Jan 26, 2016 5:32 pm

Re: check_oracle_health - call filename

Post by Chite »

mcapra, your a genius :) the escape characters worked like a charm, thank you! one additional question, I must not be adding my "critical" value in properly. when I put the --critical @0 at the end of the query It returns an "OK" when in reality it should return a critical because there were no blocking locks on the DB. thoughts on my syntax? thanks!

what I'd really like to see is if there is any number greater than 0 - the alarm goes critical after 5 min.

output from query:
./check_oracle_health --connect '<servername>:<port>/<instance>' --username '<username>' --password '<password>' --mode sql --name "select count(1) FROM sys.v_\$process p1 , sys.v_\$process p2 , sys.v_\$session s1 , sys.v_\$session s2 , dba_locks w , dba_locks h WHERE h.mode_held != 'None' AND h.mode_held != 'Null' AND w.mode_requested != 'None' AND w.lock_type (+) = h.lock_type AND w.lock_id1 (+) = h.lock_id1 AND w.lock_id2 (+) = h.lock_id2 AND w.session_id = s1.sid (+) AND h.session_id = s2.sid (+) AND s1.paddr = p1.addr (+) AND s2.paddr = p2.addr (+)" --critical 0
OK - select count(1) from sys.v_$process p1 , sys.v_$process p2 , sys.v_$session s1 , sys.v_$session s2 , dba_locks w , dba_locks h where h.mode_held != 'none' and h.mode_held != 'null' and w.mode_requested != 'none' and w.lock_type (+) = h.lock_type and w.lock_id1 (+) = h.lock_id1 and w.lock_id2 (+) = h.lock_id2 and w.session_id = s1.sid (+) and h.session_id = s2.sid (+) and s1.paddr = p1.addr (+) and s2.paddr = p2.addr (+): 0 | 'select'=0;1;0


thanks!
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: check_oracle_health - call filename

Post by mcapra »

--critical 0 should work in this case. If you wanted to be alerted after the state is held for more than 5 minutes, you could do that with some combination of the "retry interval" and "max check attempts" values.

For example, you could set your "retry interval" to 1 and your "max check attempts" value to 5. This would mean that, when a problem is first detected, the check is executed every 1 minute up to a maximum of 5 times. This effectively would mean that the problem state has to be around for at least 5 minutes before an alert is triggered.
Former Nagios employee
https://www.mcapra.com/
Locked