check_snmp_process_wizard.pl lag?

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: check_snmp_process_wizard.pl lag?

Post by jbennett »

scottwilkerson wrote:Everything looks correct, but as this is a SNMP check it is utilizing UDP connections, and as this is stateless, packets can get dropped. this is likely what is happening.

With config you posted though, it shouldn't be sending notifications if it is only down for 30 seconds, it should be trying 5 times at 1 minute intervals before sending notification
That's what's confusing me. The process isn't down at all. Is there another option to check processes that might not utilize UDP connections?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: check_snmp_process_wizard.pl lag?

Post by slansing »

Yes, you could call the check_procs plugin through NRPE, for example:

http://nagiosplugins.org/man/check_procs

http://linuxsysadminblog.com/2009/02/na ... s-running/
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: check_snmp_process_wizard.pl lag?

Post by jbennett »

I'm wondering what the difference in Nagios server load would be between these two? After making the change for one of the process checks, I've noticed that we are no longer getting false alarms for the process being critical.

Am I right in thinking that it would take some of the load off of the server in running this (and other) checks via NRPE as opposed to SNMP?

If I have a number of proicesses that are unique to our set-up that I am checking via SNMP currently, would I notice a decrease in load on the nagios server if I moved those checks to NRPE?

Also, can you check more than one process with a single check using NRPE or do I need to have a single check for each of the different processes that I want to check?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: check_snmp_process_wizard.pl lag?

Post by abrist »

jbennett wrote:I'm wondering what the difference in Nagios server load would be between these two? After making the change for one of the process checks, I've noticed that we are no longer getting false alarms for the process being critical.
That is most likely because NRPE uses TCP instead of UDP and as Scott stated, you may be experiencing dropped packets with UDP.
jbennett wrote:Am I right in thinking that it would take some of the load off of the server in running this (and other) checks via NRPE as opposed to SNMP?
NRPE may use less load than snmp, but not much less.
jbennett wrote: If I have a number of proicesses that are unique to our set-up that I am checking via SNMP currently, would I notice a decrease in load on the nagios server if I moved those checks to NRPE?
Negligible, though on a large enough scale it may be noticeable.
jbennett wrote: Also, can you check more than one process with a single check using NRPE or do I need to have a single check for each of the different processes that I want to check?
You could do it either way, although if you set them all up on one check, the entire check will fail when any of the parts fail. If you want each service check to be specific to each process checked so you get granular warnings/criticals, you will need separate checks. If you don't mind monolithic alerts, you could wrap up the whole lot of the process checks into a single service check script.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: check_snmp_process_wizard.pl lag?

Post by jbennett »

abrist wrote:You could do it either way, although if you set them all up on one check, the entire check will fail when any of the parts fail. If you want each service check to be specific to each process checked so you get granular warnings/criticals, you will need separate checks. If you don't mind monolithic alerts, you could wrap up the whole lot of the process checks into a single service check script.
I wouldn't mind that since any of these services failing would warrant attention.

When it did fail, would it spit back the process that failed or just that something in the string of processes to check failed?

I suppose I would just have the command as follows on the box I'm checking?

Code: Select all

command[check_procs]=/usr/local/nagios/libexec/check_procs -c 1:1 -C proc1 proc2 proc3
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: check_snmp_process_wizard.pl lag?

Post by abrist »

I do not think check_procs supports more than 1 specified process for the "-C" (command) switch. You would have to script a custom solution or check out the exchange: http://exchange.nagios.org/index.php?op ... word=procs
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: check_snmp_process_wizard.pl lag?

Post by jbennett »

I have found the following: http://exchange.nagios.org/directory/Un ... cs/details

I'm running into an issue though and I'm not sure where to go for help. It says that the owner is nagiosexchange.

Basically, I'm getting the following:

Code: Select all

 ./check_multi_procs.pl -b /usr/local/nagios/libexec/check_procs -u root -f proc1; proc2
Multiple process check failed on :
PROCS CRITICAL: 2 processes with UID = 0 (root), args 'proc1'
If I remove the -u root switch, the check doesn't work at all (throws back usage directions).
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: check_snmp_process_wizard.pl lag?

Post by scottwilkerson »

I'm not totally familiar with this plugin but I am sure you will need to either escape the ; between the procs

Code: Select all

./check_multi_procs.pl -b /usr/local/nagios/libexec/check_procs -u root -f proc1\; proc2
or quote them

Code: Select all

./check_multi_procs.pl -b /usr/local/nagios/libexec/check_procs -u root -f 'proc1;proc2'
This particular process does appear to require you to pass in the owner of your processes you are checking
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: check_snmp_process_wizard.pl lag?

Post by jbennett »

I've tried your suggestions in a few different ways without any luck.

The readme has the following:

Code: Select all

# Config Example : 
# check_command check_nrpe!check_multi_procs!user!"proc1:proc2:proc3" 
While the help information shows the following:

Code: Select all

# ./check_multi_procs.pl -b /usr/local/nagios/libexec/check_procs -u root -f lane; ves_ocr
Multiple process check failed on :
PROCS CRITICAL: 2 processes with UID = 0 (root), args 'lane'
# ./check_multi_procs.pl -b /usr/local/nagios/libexec/check_procs -f proc1;proc2
Usage : ./check_multi_procs.pl -f filer -u user [-b check_proc_bin] [-s min_proc] [-x max_proc] [-h]
        -u : Set user owner of process
        -f : Give process to check (must be string and could be separated by ';')
        -b : Specify the check_proc Nagios plugin binary
        -s : Set min process to be available (default 1)
        -x : Set max process to be available (default 1)
        -h : Print this help message
It seems that this would help alleviate some load on the server as I have about 400 boxes that I need to check 5 processes on each (2000 checks total). If I am able to implement this check instead, it would lower that to only 400 checks.

If any one of these processes goes awry it is considered a critical, so a blanket notification is just fine here.

Am I correct in thinking that that going this route instead of individual service checks for each process would help lower the load?

We are looking to add a number of other services checks in the near future on these boxes. We are looking at a potential for about 25-30 more service checks per box. If I can work to streamline the current checks, it would help us with load in the future.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: check_snmp_process_wizard.pl lag?

Post by scottwilkerson »

I looked at the code and it is supposed to be split by : NOT ; (really bad help file...)

lets try

Code: Select all

/check_multi_procs.pl -b /usr/local/nagios/libexec/check_procs -u root -f lane:ves_ocr
jbennett wrote:Am I correct in thinking that that going this route instead of individual service checks for each process would help lower the load?
Certainly would on the XI server.. Another possibility would be to use NRDS and setup the checks individually (Available under Admin -> NRDS Config Manager)
Here's a video
http://library.nagios.com/library/produ ... s-tutorial
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked