Issue Monitoring Oracle Processes with 'check_procs'

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
lazzarinof
Posts: 50
Joined: Thu Sep 23, 2021 12:26 pm

Issue Monitoring Oracle Processes with 'check_procs'

Post by lazzarinof »

Good afternoon Nagios team!

I'm trying to monitor some oracle processes using proc_mon, but am hitting a snag if processes have similar names. It appears it defaults to having an open wildcard at the end?
Ex: I'm trying to monitor ora_pmon_cetststcdb. But it's also catching ora_pmon_cetststcdb19.

I can see in the documentation for the proc_mon plugin there's supposed to be an exclusion option (-X, as shown https://nagios-plugins.org/doc/man/check_procs.html). However, -X appears to be ignored completely when added to the arguments.

Is there some way to either insert a break, so the plugin knows to stop at the end of "ora_pmon_cetststcdb," or has anyone had any luck getting -X to work?

Thank you!
lazzarinof
Posts: 50
Joined: Thu Sep 23, 2021 12:26 pm

Re: Issue Monitoring Oracle Processes with 'check_procs'

Post by lazzarinof »

Still fighting with this one, and hoping someone knows why check_procs doesn't work as intended.
User avatar
kfanselow
Posts: 241
Joined: Tue Aug 31, 2021 3:25 pm

Re: Issue Monitoring Oracle Processes with 'check_procs'

Post by kfanselow »

Hi lazzarinof,

Can you send us the command string you're using so we can take a look ? I believe check_procs does a substring match using a strstr call by default.

To fetch the command string from a configured service please do the following:

Configure (top) -> Core Config Manager -> Services (left under Monitoring) -> find the service check and click on the wrench ( right side ). -> Click on the "Run Check Command" button and its follow up prompt.

Thanks and Best Regards,
Keith
lazzarinof
Posts: 50
Joined: Thu Sep 23, 2021 12:26 pm

Re: Issue Monitoring Oracle Processes with 'check_procs'

Post by lazzarinof »

Good afternoon Keith,

Here is some of what I've tried:

Without the -X switch (this is the check we were previously using, but will need to change to add more databases and still monitor in a reasonable way)

Code: Select all

[nagios@nagiosxi.state.nv.us ~]$ /usr/local/nagios/libexec/check_nrpe -2 -H 10.128.2.72 -t 30 -c check_active_procs -a ""1:1 ora_pmon_cetststcdb" "
PROCS CRITICAL: 2 processes with args 'ora_pmon_cetststcdb'
Adding the -X switch inside quotes, as part of $ARG1$ under Service Management:

Code: Select all

[nagios@nagiosxi.state.nv.us ~]$ /usr/local/nagios/libexec/check_nrpe -2 -H 10.128.2.72 -t 30 -c check_active_procs -a ""1:1 ora_pmon_cetststcdb -X ora_pmon_cetststcdb19" "
PROCS CRITICAL: 2 processes with args 'ora_pmon_cetststcdb'
Adding the -X switch outside of the quotes, as part of $ARG2$ under Service Management (this seems to at least aknowledge the switch, even if it doesn't work. I'd note this is also the response we get if we run our initial check without quotes):

Code: Select all

[nagios@nagiosxi.state.nv.us ~]$ /usr/local/nagios/libexec/check_nrpe -2 -H 10.128.2.72 -t 30 -c check_active_procs -a ""1:1 ora_pmon_cetststcdb" -X ora_pmon_cetststcdb19"
Usage: check_procs -w <range> -c <range> [-m metric] [-s state] [-p ppid]
[-u user] [-r rss] [-z vsz] [-P %cpu] [-a argument-array]
[-C command] [-t timeout] [-v]
Adding the -X switch without quotes to $ARG2$ under Service Management (appears to ingore this argument and run without it)

Code: Select all

[nagios@nagiosxi.state.nv.us ~]$ /usr/local/nagios/libexec/check_nrpe -2 -H 10.128.2.72 -t 30 -c check_active_procs -a ""1:1 ora_pmon_cetststcdb " "
PROCS CRITICAL: 2 processes with args 'ora_pmon_cetststcdb'
Adding the -X switch with quotes to $ARG2$ under Service Management (appears to ingore this argument and run without it)

Code: Select all

[nagios@nagiosxi.state.nv.us ~]$ /usr/local/nagios/libexec/check_nrpe -2 -H 10.128.2.72 -t 30 -c check_active_procs -a ""1:1 ora_pmon_cetststcdb " "
PROCS CRITICAL: 2 processes with args 'ora_pmon_cetststcdb'

Thank you,
Frank
User avatar
kfanselow
Posts: 241
Joined: Tue Aug 31, 2021 3:25 pm

Re: Issue Monitoring Oracle Processes with 'check_procs'

Post by kfanselow »

Hi Frank,

Thanks for sending those. So we have two components at play here: 1) check_nrpe and 2) check_procs.

I'm going to try to work the process backwards and to replicate the problem as best I can. I've got two resident processes abc123 and abc1234. So the first thing I want to test is check_proc itself:

XI server:

Code: Select all

[root@kf-centos-79 libexec]# pwd 
/usr/local/nagios/libexec

[root@kf-centos-79 libexec]# ps -ax |grep abc123 
21105 pts/0    S      0:00 ./abc123 -l 2000
21289 pts/0    S+     0:00 ./abc1234 -l 2001
24668 pts/1    R+     0:00 grep --color=auto abc123

[root@kf-centos-79 libexec]# ./check_procs -a abc123 
PROCS OK: 2 processes with args 'abc123' | procs=2;;;0;

[root@kf-centos-79 libexec]# ./check_procs -a abc123 -X abc1234  
PROCS OK: 1 process with args 'abc123', exclude progs 'abc1234' | procs=1;;;0;
It looks like when check_proc is run with the -X flag on the command line it does what we expect, it excludes the process with the 4 at the end (abc1234). Now let's add NRPE into the mix -- I have check_procs configured in my ncpa.cfg file as the following (Its FreeBSD so the path is different but the same processes running):

Code: Select all

command[check_procs]=/usr/local/libexec/nagios/check_procs $ARG1$ 
### with don't blame nrpe set to 1  
dont_blame_nrpe=1 
It seems that when I pass the arguments that were working for the check_procs command on the cli it seems to be working. I think the key difference is I am passing a "-a" inside the string. The npre call interprets the first -a and string being passed needs to pass a "-a" to check_procs.

Code: Select all

[root@kf-centos-79 libexec]# /usr/local/nagios/libexec/check_nrpe -2 -H 192.168.130.228 -t 30 -c check_procs -a "-a abc123" 
PROCS OK: 2 processes with args 'abc123' | procs=2;;;0;

[root@kf-centos-79 libexec]# /usr/local/nagios/libexec/check_nrpe -2 -H 192.168.130.228 -t 30 -c check_procs -a "-a abc123 -X abc1234" 
PROCS OK: 1 process with args 'abc123', exclude progs 'abc1234' | procs=1;;;0;
Let's add in your threshold to make sure that is working as well:

Code: Select all

[root@kf-centos-79 libexec]# /usr/local/nagios/libexec/check_nrpe -2 -H 192.168.130.228 -t 30 -c check_procs -a " 1:1 -a abc123 -X abc1234" 
PROCS OK: 1 process with args 'abc123', exclude progs 'abc1234' | procs=1;1:1;;0;
[root@kf-centos-79 libexec]# /usr/local/nagios/libexec/check_nrpe -2 -H 192.168.130.228 -t 30 -c check_procs -a " 1:1 -a abc123 " 
PROCS WARNING: 2 processes with args 'abc123' | procs=2;1:1;;0;
So based on that I would say give this a try and let us know how it goes:

Code: Select all

/usr/local/nagios/libexec/check_nrpe -2 -H 10.128.2.72 -t 30 -c check_active_procs -a "1:1 -a ora_pmon_cetststcdb -X ora_pmon_cetststcdb19"
Hope this is helpful.

Thanks and Best Regards,
Keith
lazzarinof
Posts: 50
Joined: Thu Sep 23, 2021 12:26 pm

Re: Issue Monitoring Oracle Processes with 'check_procs'

Post by lazzarinof »

Good afternoon Keith,

I tried adding the -a, without luck:


Current argument, returns 2 processes:

Code: Select all

[nagios@nagiosxi.state.nv.us ~]$ /usr/local/nagios/libexec/check_nrpe -2 -H 10.128.2.72 -t 30 -c check_active_procs -a ""1:1 ora_pmon_cetststcdb""
PROCS CRITICAL: 2 processes with args 'ora_pmon_cetststcdb'
As posted, with quotes to escape. Appears to be reading "-a" as the process name.

Code: Select all

[nagios@nagiosxi.state.nv.us ~]$ /usr/local/nagios/libexec/check_nrpe -2 -H 10.128.2.72 -t 30 -c check_active_procs -a ""1:1 -a ora_pmon_cetststcdb -X ora_pmon_cetststcdb19""
PROCS OK: 1 process with args '-a'

As posted, without quotes to escape. Our system seems to require quotes to escape automatic quotes added to the $ARG$ fields:

Code: Select all

[nagios@nagiosxi.state.nv.us ~]$ /usr/local/nagios/libexec/check_nrpe -2 -H 10.128.2.72 -t 30 -c check_active_procs -a "1:1 -a ora_pmon_cetststcdb -X ora_pmon_cetststcdb19"
Usage: check_procs -w <range> -c <range> [-m metric] [-s state] [-p ppid]
[-u user] [-r rss] [-z vsz] [-P %cpu] [-a argument-array]
[-C command] [-t timeout] [-v]

Now, as I keep working to on why the -X is ignored, I'm also curious: why would our system automatically add quotes, when it appears yours does not? And I can see in yours it has a -a already. Why would the second -a force the check? Anywho... back to the drawing board!

Thank you,
-Frank
User avatar
kfanselow
Posts: 241
Joined: Tue Aug 31, 2021 3:25 pm

Re: Issue Monitoring Oracle Processes with 'check_procs'

Post by kfanselow »

Hi Frank,

What version of the check_proc plugin are you currently using on the target system ? I noticed in your last post that the usage message doesn't have the -X syntax listed like the latest version:

Code: Select all

$ /usr/local/nagios/libexec/check_procs -V
check_procs v2.3.3 (nagios-plugins 2.3.3)
$ /usr/local/nagios/libexec/check_procs -Q
/usr/local/nagios/libexec/check_procs: invalid option -- 'Q'
Usage:
check_procs -w <range> -c <range> [-m metric] [-s state] [-p ppid] [-j jid]
 [-u user] [-r rss] [-z vsz] [-P %cpu] [-a argument-array]
 [-C command] [-X process_to_exclude] [-k] [-t timeout] [-v]
Not the -X option in the bottom line of the usage output.

Thanks and Best Regards
Keith
lazzarinof
Posts: 50
Joined: Thu Sep 23, 2021 12:26 pm

Re: Issue Monitoring Oracle Processes with 'check_procs'

Post by lazzarinof »

Good morning Keith,

I apologize for the long delay: this took a back burner for a while.

I can confirm I'm running v2.4.0 on Nagios itself. However, I don't have access to many of the systems we monitor (including this one). I am willing to bet it is older, as the return does not show the -X option. Is there some way to push updates to checks on systems? It could be quite an issue (to the point of requiring dedicated staff) to roll through updates across our infrastructure.

Thank you!
User avatar
kfanselow
Posts: 241
Joined: Tue Aug 31, 2021 3:25 pm

Re: Issue Monitoring Oracle Processes with 'check_procs'

Post by kfanselow »

Hi Frank,

Unfortunately NPRE does not have a mechanism for deploying plugin updates. If you need to manually update we strongly recommend moving to NCPA which is our supported agent; NPRE is deprecated and we are no longer developing it.

What version of XI are you using ?

Thanks and Best Regards,
Keith
lazzarinof
Posts: 50
Joined: Thu Sep 23, 2021 12:26 pm

Re: Issue Monitoring Oracle Processes with 'check_procs'

Post by lazzarinof »

Good afternoon Keith,

We're on 5.8.7, and due to upgrade once we've done some migrations with LS.

We're updating to NCPA (liking it quite a bit, but it's going to take a while to switch over completely. Especially because we're hitting a bit of the same issue with monitoring Oracle Processes: NCPA sees *everything* as "Oracle," and isn't subdividing into the names).

Thank you,
-Frank
Locked