Page 2 of 4

Re: check_snmp_process_wizard.pl lag?

Posted: Tue Mar 19, 2013 3:42 pm
by jbennett
scottwilkerson wrote:Everything looks correct, but as this is a SNMP check it is utilizing UDP connections, and as this is stateless, packets can get dropped. this is likely what is happening.

With config you posted though, it shouldn't be sending notifications if it is only down for 30 seconds, it should be trying 5 times at 1 minute intervals before sending notification
That's what's confusing me. The process isn't down at all. Is there another option to check processes that might not utilize UDP connections?

Re: check_snmp_process_wizard.pl lag?

Posted: Tue Mar 19, 2013 3:46 pm
by slansing
Yes, you could call the check_procs plugin through NRPE, for example:

http://nagiosplugins.org/man/check_procs

http://linuxsysadminblog.com/2009/02/na ... s-running/

Re: check_snmp_process_wizard.pl lag?

Posted: Tue Mar 26, 2013 11:00 am
by jbennett
I'm wondering what the difference in Nagios server load would be between these two? After making the change for one of the process checks, I've noticed that we are no longer getting false alarms for the process being critical.

Am I right in thinking that it would take some of the load off of the server in running this (and other) checks via NRPE as opposed to SNMP?

If I have a number of proicesses that are unique to our set-up that I am checking via SNMP currently, would I notice a decrease in load on the nagios server if I moved those checks to NRPE?

Also, can you check more than one process with a single check using NRPE or do I need to have a single check for each of the different processes that I want to check?

Re: check_snmp_process_wizard.pl lag?

Posted: Tue Mar 26, 2013 11:14 am
by abrist
jbennett wrote:I'm wondering what the difference in Nagios server load would be between these two? After making the change for one of the process checks, I've noticed that we are no longer getting false alarms for the process being critical.
That is most likely because NRPE uses TCP instead of UDP and as Scott stated, you may be experiencing dropped packets with UDP.
jbennett wrote:Am I right in thinking that it would take some of the load off of the server in running this (and other) checks via NRPE as opposed to SNMP?
NRPE may use less load than snmp, but not much less.
jbennett wrote: If I have a number of proicesses that are unique to our set-up that I am checking via SNMP currently, would I notice a decrease in load on the nagios server if I moved those checks to NRPE?
Negligible, though on a large enough scale it may be noticeable.
jbennett wrote: Also, can you check more than one process with a single check using NRPE or do I need to have a single check for each of the different processes that I want to check?
You could do it either way, although if you set them all up on one check, the entire check will fail when any of the parts fail. If you want each service check to be specific to each process checked so you get granular warnings/criticals, you will need separate checks. If you don't mind monolithic alerts, you could wrap up the whole lot of the process checks into a single service check script.

Re: check_snmp_process_wizard.pl lag?

Posted: Tue Mar 26, 2013 11:20 am
by jbennett
abrist wrote:You could do it either way, although if you set them all up on one check, the entire check will fail when any of the parts fail. If you want each service check to be specific to each process checked so you get granular warnings/criticals, you will need separate checks. If you don't mind monolithic alerts, you could wrap up the whole lot of the process checks into a single service check script.
I wouldn't mind that since any of these services failing would warrant attention.

When it did fail, would it spit back the process that failed or just that something in the string of processes to check failed?

I suppose I would just have the command as follows on the box I'm checking?

Code: Select all

command[check_procs]=/usr/local/nagios/libexec/check_procs -c 1:1 -C proc1 proc2 proc3

Re: check_snmp_process_wizard.pl lag?

Posted: Tue Mar 26, 2013 1:03 pm
by abrist
I do not think check_procs supports more than 1 specified process for the "-C" (command) switch. You would have to script a custom solution or check out the exchange: http://exchange.nagios.org/index.php?op ... word=procs

Re: check_snmp_process_wizard.pl lag?

Posted: Tue Mar 26, 2013 3:51 pm
by jbennett
I have found the following: http://exchange.nagios.org/directory/Un ... cs/details

I'm running into an issue though and I'm not sure where to go for help. It says that the owner is nagiosexchange.

Basically, I'm getting the following:

Code: Select all

 ./check_multi_procs.pl -b /usr/local/nagios/libexec/check_procs -u root -f proc1; proc2
Multiple process check failed on :
PROCS CRITICAL: 2 processes with UID = 0 (root), args 'proc1'
If I remove the -u root switch, the check doesn't work at all (throws back usage directions).

Re: check_snmp_process_wizard.pl lag?

Posted: Wed Mar 27, 2013 7:05 am
by scottwilkerson
I'm not totally familiar with this plugin but I am sure you will need to either escape the ; between the procs

Code: Select all

./check_multi_procs.pl -b /usr/local/nagios/libexec/check_procs -u root -f proc1\; proc2
or quote them

Code: Select all

./check_multi_procs.pl -b /usr/local/nagios/libexec/check_procs -u root -f 'proc1;proc2'
This particular process does appear to require you to pass in the owner of your processes you are checking

Re: check_snmp_process_wizard.pl lag?

Posted: Wed Mar 27, 2013 8:15 am
by jbennett
I've tried your suggestions in a few different ways without any luck.

The readme has the following:

Code: Select all

# Config Example : 
# check_command check_nrpe!check_multi_procs!user!"proc1:proc2:proc3" 
While the help information shows the following:

Code: Select all

# ./check_multi_procs.pl -b /usr/local/nagios/libexec/check_procs -u root -f lane; ves_ocr
Multiple process check failed on :
PROCS CRITICAL: 2 processes with UID = 0 (root), args 'lane'
# ./check_multi_procs.pl -b /usr/local/nagios/libexec/check_procs -f proc1;proc2
Usage : ./check_multi_procs.pl -f filer -u user [-b check_proc_bin] [-s min_proc] [-x max_proc] [-h]
        -u : Set user owner of process
        -f : Give process to check (must be string and could be separated by ';')
        -b : Specify the check_proc Nagios plugin binary
        -s : Set min process to be available (default 1)
        -x : Set max process to be available (default 1)
        -h : Print this help message
It seems that this would help alleviate some load on the server as I have about 400 boxes that I need to check 5 processes on each (2000 checks total). If I am able to implement this check instead, it would lower that to only 400 checks.

If any one of these processes goes awry it is considered a critical, so a blanket notification is just fine here.

Am I correct in thinking that that going this route instead of individual service checks for each process would help lower the load?

We are looking to add a number of other services checks in the near future on these boxes. We are looking at a potential for about 25-30 more service checks per box. If I can work to streamline the current checks, it would help us with load in the future.

Re: check_snmp_process_wizard.pl lag?

Posted: Wed Mar 27, 2013 8:41 am
by scottwilkerson
I looked at the code and it is supposed to be split by : NOT ; (really bad help file...)

lets try

Code: Select all

/check_multi_procs.pl -b /usr/local/nagios/libexec/check_procs -u root -f lane:ves_ocr
jbennett wrote:Am I correct in thinking that that going this route instead of individual service checks for each process would help lower the load?
Certainly would on the XI server.. Another possibility would be to use NRDS and setup the checks individually (Available under Admin -> NRDS Config Manager)
Here's a video
http://library.nagios.com/library/produ ... s-tutorial