NSClient check timeouts fixed, but tmp Check files continue

vAJ · Post by **vAJ** » Wed May 01, 2013 3:36 pm

I think I had the other thread hijacked for long enough. Time to start fresh.

I assumed that the frequent timeouts of checks to Windows hosts via NSClient were related to the growth of check* files in /tmp. However, I've modified the timeout for check_nt globally and while I no longer have those alerts, I still have a fair amount of check files created since that change.

Should I be concerned about this? 50k+ of these before I flushed the directory the other day was only a few hundred MB, not a problem. But if this is indicative of this system being overtaxed, I'd like to know.

System:

Nagios XI Version : 2012R1.6
nagios.XXXXXXX.com 2.6.32-220.el6.x86_64 x86_64
CentOS release 6.2 (Final)
Gnome is not installed
Apache Information

PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31
Server Name: nagios.XXXXXXXX.com
Server Address: XXX.XXX.XXX.XXX
Server Port: 80
Date/Time

PHP Timezone: America/Los_Angeles
PHP Time: Wed, 01 May 2013 13:34:56 -0700
System Time: Wed, 01 May 2013 13:34:56 -0700
Nagios XI Data

nagios (pid 24117) is running...
NPCD running (pid 1857).
ndo2db (pid 31575) is running...
CPU Load 15: 0.91
Total Hosts: 791
Total Services: 13518

slansing · Post by **slansing** » Wed May 01, 2013 3:45 pm

Can you catch one of these tmp files and open it? What is the output "please block sensation information." This will give us a direction to go off of and seek additional checks which may be timing out. Your system should not be the root cause of this, but rather checks not returning within the time they are allotted by their command/service definitions.

vAJ · Post by **vAJ** » Wed May 01, 2013 4:35 pm

Code: Select all

### Active Check Result File ###
file_time=1367010987

### Nagios Service Check Result ###
# Time: Fri Apr 26 14:16:27 2013
host_name=vweb447
service_description=ASP Requests Per Second
check_type=0
check_options=0
scheduled_check=1
reschedule_check=1
latency=48.832000
start_time=1367010987.832873

vAJ · Post by **vAJ** » Wed May 01, 2013 4:56 pm

They are all NSClient Perfmon counter checks.

They were set up by the previous owner of monitoring tools and I can't find where they are defined.

They do not appear in CCM, legacy CCM or the services cfg file for any of the hosts. So I'm guessing they are part of a master service definition somewhere.

scottwilkerson · Post by **scottwilkerson** » Wed May 01, 2013 5:16 pm

Can you run the following to try to track these down

Code: Select all

grep -R "ASP Requests Per Second" /usr/local/nagios/etc/

vAJ · Post by **vAJ** » Thu May 02, 2013 9:06 am

Brilliant. Never thought to do that.

Found one service cfg file that defined the large host group that this was bound too. Weird thing is, the host_name in the config file is not the hostname of the service config file.

Code: Select all

define service {
        host_name                       vweb001
        service_description             ASP Requests Per Second
        use                             xiwizard_windowsserver_nsclient_service
        hostgroup_name                  vweb-servers
        check_command                   check_xi_service_nsclient!XXXXXXXXX!COUNTER!-l "\\Active Server Pages\\Requests/Sec","Requests /sec: %.f" -w 1800 -c 2500!!!!!
        max_check_attempts              5
        check_interval                  1
        retry_interval                  5
        check_period                    24x7
        notification_interval           60
        notification_period             24x7
        contact_groups                  noc
        register                        1
        }

So, does applying a service check to a large hostgroup cause performance issues?

Post by **lmiltchev** » Thu May 02, 2013 9:58 am

So, does applying a service check to a large hostgroup cause performance issues?

Not necessarily. It is no different than setting up this service on each host (member of this hostgroup) individually. Applying a service to a hostgroup has some advantages though - you can modify it in one place, and the service will be applied to all of the new members of the hostgroup, that you might add.

vAJ · Post by **vAJ** » Thu May 02, 2013 10:41 am

That's what I figured. The sanity check is always appreciated!

Any ideas to determine why the check files continue to enumerate in /tmp?

Post by **lmiltchev** » Thu May 02, 2013 10:55 am

Do you have scripts that take a fairly long time to execute? Try following the steps, outlined on our wiki page:

http://support.nagios.com/wiki/index.ph ... ely_manner

Hope this helps.

vAJ · Post by **vAJ** » Thu May 02, 2013 11:05 am

Yes, that was already in place. Forgot to mention that in this thread, but was mentioned in the thread I hijacked earlier.

Here are my perf metrics:

Code: Select all

Host Check Latency
Min	0.00 sec	
Max	1.65 sec	
Avg	0.58 sec	
 
Host Check Execution Time
Min	0.01 sec	
Max	10.01 sec	
Avg	0.04 sec	
 
Service Check Latency
Min	0.00 sec	
Max	1.51 sec	
Avg	0.29 sec	
 
Service Check Execution Time
Min	0.01 sec	
Max	10.11 sec	
Avg	0.22 sec

Should I look to increase the timeout in /etc/init.d/nagios ?

Nagios Support Forum

NSClient check timeouts fixed, but tmp Check files continue

NSClient check timeouts fixed, but tmp Check files continue

Re: NSClient check timeouts fixed, but tmp Check files conti

Re: NSClient check timeouts fixed, but tmp Check files conti

Re: NSClient check timeouts fixed, but tmp Check files conti

Re: NSClient check timeouts fixed, but tmp Check files conti

Re: NSClient check timeouts fixed, but tmp Check files conti

Re: NSClient check timeouts fixed, but tmp Check files conti

Re: NSClient check timeouts fixed, but tmp Check files conti

Re: NSClient check timeouts fixed, but tmp Check files conti

Re: NSClient check timeouts fixed, but tmp Check files conti