error with check_oracle_health on long qry

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
WillH
Posts: 54
Joined: Mon Aug 03, 2020 10:37 am

error with check_oracle_health on long qry

Post by WillH »

When running a moderately complex qry we get the error:
UNKNOWN - got no valid response for (presents the entire qry, pasted in below, translated back from url encoding)

we're feeding
./check_oracle_health --connect=<DB> --user=<USR> --password=<PW> --ident --encode --mode sql --name <URLENCODEDQUERY>

We get the same errors from the command line and CCM.
A shorter simple qry, such as select%20JOB_TYPE%20from%20apt_v_job works fine

The qry, below, that is translated out of URLencoding we presented, will run fine in SQL developer

Any formatting changes to make, or a way to break the qry across arguments? :?

|
WITH
get_var AS (SELECT TRUNC(sysdate) - TRUNC(sysdate-2) nbr_of_days FROM dual),
get_last_backup_date AS (SELECT client_id, TRUNC(sysdate)-TRUNC(MAX(finish_date)) last_backup FROM apt_v_job WHERE summary_status < 2 AND finish_date > sysdate-45 GROUP BY client_id),
get_counts AS (
SELECT
to_char(start_date,'MM/DD/YYYY') the_date,
client_id,
server_id,
rtrim(lower(client_name),'.') client_name,
MAX(vendor_status) max_vendor_status,
NVL(SUM(DECODE(summary_status,0,1,0)),0) success_count,
NVL(SUM(DECODE(summary_status,1,1,0)),0) partial_count,
NVL(SUM(DECODE(summary_status,2,1,0)),0) failed_count
FROM apt_v_job
WHERE
start_date BETWEEN sysdate-2 AND sysdate
AND client_id <> server_id
AND job_type IN ('101','102')
AND vendor_status NOT IN ('90','129','150','99','230','239','240','245','196','999999')
AND rtrim(lower(client_name), '.') NOT IN (select distinct lower(media_server_name) from apt_v_nbu_media_server)
AND client_name NOT like 'nbum%'
GROUP BY
to_char(start_date,'MM/DD/YYYY'),
client_id,
server_id,
rtrim(lower(client_name), '.')
),
get_sums AS (
SELECT
c.client_id,
c.server_id,
s1.display_name client,
s1.os_platform,
s2.display_name master,
MAX(lb.last_backup) last_backup,
MAX(c.max_vendor_status) max_vendor_status,
SUM(c.success_count) success_count,
SUM(c.partial_count) partial_count,
SUM(c.failed_count) failed_count
FROM get_counts c
INNER JOIN apt_v_server s1 ON s1.server_id = c.client_id
INNER JOIN apt_v_server s2 ON s2.server_id = c.server_id
LEFT JOIN get_last_backup_date lb ON lb.client_id = c.client_id
WHERE s1.os_platform IS NOT NULL
GROUP BY
c.client_id,
c.server_id,
s2.display_name,
s1.os_platform,
s1.display_name
)
SELECT
max_vendor_status,
master,
os_platform,
trim(both ' ' from client) client,
NVL(last_backup,45) last_backup
FROM get_sums, get_var
WHERE failed_count >= nbr_of_days
AND (last_backup >= nbr_of_days OR last_backup IS NULL)
ORDER BY 1,2,3,4
gsmith
Posts: 1253
Joined: Tue Mar 02, 2021 11:15 am

Re: error with check_oracle_health on long qry

Post by gsmith »

Hi

Just a quick thought. If you put the ncpa client on the Oracle machine you could create a script
on the Oracle machine that contains your long query, and then call it from your XI server
using check_ncpa.

If you prefer to use the check_oracle_health that's coll too. Just let me know and I will
dig into getting that query to work with that.

Thanks
WillH
Posts: 54
Joined: Mon Aug 03, 2020 10:37 am

Re: error with check_oracle_health on long qry

Post by WillH »

Thanks gsmith
Do you mean the Oracle DB server hosting the DB?
If so, we try to avoid that approach because their is no guarantee the DBAs will tell the team responsible for application monitoring that a DB has been moved.
If that's not what you meant, is there an example you can point me to for setting up that call?

We're trying to keep as much stuff on our XIs as possible, or on a specific plugin that we can roll out globally
gsmith
Posts: 1253
Joined: Tue Mar 02, 2021 11:15 am

Re: error with check_oracle_health on long qry

Post by gsmith »

Hi,

I totally understand your situation. I'll get going on your ticket soon - realistically it
will be tomorrow before I can start digging in.

Thanks
WillH
Posts: 54
Joined: Mon Aug 03, 2020 10:37 am

Re: error with check_oracle_health on long qry

Post by WillH »

Thanks g,
We've got a little runway on this, so it's not a runaway train. I'll be off Friday, barring emergencies; I won't be back on the clock until Tuesday July 6.

From what we can see, digging in parallel
the query is not the issue but the real issue is Nagios script couldn't handle tableview
the syntax is fine but the script is breaking when it use tableview in the query.
gsmith
Posts: 1253
Joined: Tue Mar 02, 2021 11:15 am

Re: error with check_oracle_health on long qry

Post by gsmith »

HI

I got the plugin installed and started playing around with it.

After quite awhile I figured out the --mode encode bit. It's a convenience item, and not to be used
in the command Nagios runs.

If you enter:

Code: Select all

 ./check_oracle_health  --connect="nagios/[email protected]:1521" --mode encode
and
hit <Enter> it wil just sit there (I think that is what's happening to you.
However if you paste or type something like:

Code: Select all

select count(*) from v$session where status = 'ACTIVE'
It will come back with:

Code: Select all

select%20count%28%2A%29%20from%20v%24session%20where%20status%20%3D%20%27ACTIVE%27
So change your command to:

Code: Select all

./check_oracle_health --connect="username/passwd@server:1521"--mode sql --name <URLENCODEDQUERY>
And let me know what happens.

Thanks
WillH
Posts: 54
Joined: Mon Aug 03, 2020 10:37 am

Re: error with check_oracle_health on long qry

Post by WillH »

gsmith,
thanks for the reply, but still getting an error at the cmd line

We get as a return, the fully formed qry w/o urlencoding, that I can past into sql developer and execute without error. It is bookended with
UNKNOWN - got no valid response for
<fully formed qry w/o url encoding>
- ORA-00900: invalid SQL statement (DBD ERROR: OCIStmtExecute/Describe)
gsmith
Posts: 1253
Joined: Tue Mar 02, 2021 11:15 am

Re: error with check_oracle_health on long qry

Post by gsmith »

Hi,

I tried some simple simple queries like:

Code: Select all

select 'test' from dual;
select 1 from dual;
The first one fails, the second succeeds.

I know you tested

Code: Select all

select%20JOB_TYPE%20from%20apt_v_job
, I am guessing that returns an int?

My suspicion is the query has to result in a number, check it:

Code: Select all

./check_oracle_health --connect="orcl" --user=system --password=new123 --mode sql --name select%20%27test%27%20from%20dual
UNKNOWN - got no valid response for select 'test' from dual
./check_oracle_health --connect="orcl" --user=system --password=new123 --mode sql --name select%201%20from%20dual
OK - select 1 from dual: 1 | 'select'=1;1;5
 ./check_oracle_health --connect="orcl" --user=system --password=new123 --mode sql --name select%20JOB_TYPE%20from%20apt_v_job
CRITICAL - select job_type from apt_v_job: 101 | 'select'=101;1;5
Thanks
Locked