Checking for multiple instances of the same process

GreatWolfResorts · Post by **GreatWolfResorts** » Tue Sep 20, 2011 1:16 pm

What would be the best method for checking process quantity on an machine other than the local nagios server? For example: We have servers at each company location which run interface processes called ifc8.exe. There may be upwards of 8 ifc8.exe processes running on a single server. I want to be able to monitor that all 8, no more no less, are running on that server. If the quantity varies in either direction, I'd like a critical alert to be sent out.

Any help is greatly appreciated. Thanks!

- Dan

mguthrie · Post by **mguthrie** » Wed Sep 21, 2011 11:01 am

I ran this search on exchange.nagios.org, and I came across a few results. I don't think we've test most of these, but they might get you what you need.
http://exchange.nagios.org/index.php?op ... ss%20count

GreatWolfResorts · Post by **GreatWolfResorts** » Fri Nov 11, 2011 10:54 am

It's been a little while, but I finally got a chance to look back into this issue, and here are my results. Because we use NSClient++ to do many of our basic level checks already, I went with the following route:

1. Install the NSClient++ on the remote windows machine you wish to check processor counts on.
2. Modify the NSC.ini file, enabling NRPE protocol and the following commands:

Code: Select all

----------------------------------
[NRPE]
;# NRPE PORT NUMBER
;  This is the port the NRPEListener.dll will listen to.
port=5666
;
;# COMMAND TIMEOUT
;  This specifies the maximum number of seconds that the NRPE daemon will allow plug-ins to finish executing before killing them off.
command_timeout=60
;
;# COMMAND ARGUMENT PROCESSING
;  This option determines whether or not the NRPE daemon will allow clients to specify arguments to commands that are executed.
allow_arguments=1
;
;# COMMAND ALLOW NASTY META CHARS
;  This option determines whether or not the NRPE daemon will allow clients to specify nasty (as in |`&><'"\[]{}) characters in arguments.
allow_nasty_meta_chars=0
;
;# USE SSL SOCKET
;  This option controls if SSL should be used on the socket.
use_ssl=1
;
;# BIND TO ADDRESS
;  Allows you to bind server to a specific local address. This has to be a dotted ip adress not a hostname.
;  Leaving this blank will bind to all avalible IP adresses.
; bind_to_address=
;
;# ALLOWED HOST ADDRESSES
;  This is a comma-delimited list of IP address of hosts that are allowed to talk to NRPE deamon.
;  If you leave this blank the global version will be used instead.
allowed_hosts=
;
;# SCRIPT DIRECTORY
;  All files in this directory will become check commands.
;  *WARNING* This is undoubtedly dangerous so use with care!
;script_dir=scripts\
;
;# SOCKET TIMEOUT
;  Timeout when reading packets on incoming sockets. If the data has not arrived withint this time we will bail out.
socket_timeout=30

----------------------------------

3. Create a new service check using the "Check_NRPE" command already provided in Nagios XI.
4. $ARG1$ = CheckProcState
5. $ARG2$ = -a MinCritCount=10 MaxCritCount=12 ifc8.exe=started

We were experiencing issues where we have 11 processes called ifc8.exe running (One for each property interface). Periodically the process would fail visually and the technician at the site would start up a new interface to resume functionality. In the background the ifc8.exe process was still loaded in windows, and would begin utilizing chunks of the CPU on a constant basis. This would cause the CPU to max out. We could certainly adjust our CPU check threshold to compensate for the higher utilization and notify us that way, but I feel this method is more direct to the situation causing the high utilization. If something else were to spike the CPU we'd be chasing our tails assuming it was an additional processor.

Once you configure the NSC.ini and restart the NSClient service, remote into your Nagios server and run the following check under /usr/local/nagios/libexec:

./check_nrpe -H <IP Address> -c CheckProcState -a MinCritCount=<count> MaxCritCount=<count> <process name>=started
Example: ./check_nrpe -H 10.10.10.10 -c CheckProcState -a MinCritCount=1 MaxCritCount=3 notepad.exe=started

This should return an OK or Critical depending on what is running. If you receive an UNKNOWN response, review your nsclient.log and nsc.ini for configuration issues.

Hope this helps!

Dan

mguthrie · Post by **mguthrie** » Fri Nov 11, 2011 11:42 am

Thanks for the update, we appreciate it!

Nagios Support Forum

Checking for multiple instances of the same process

Checking for multiple instances of the same process

Re: Checking for multiple instances of the same process

Re: Checking for multiple instances of the same process

Re: Checking for multiple instances of the same process