NSClient check timeouts fixed, but tmp Check files continue

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

NSClient check timeouts fixed, but tmp Check files continue

Post by vAJ »

I think I had the other thread hijacked for long enough. Time to start fresh.

I assumed that the frequent timeouts of checks to Windows hosts via NSClient were related to the growth of check* files in /tmp. However, I've modified the timeout for check_nt globally and while I no longer have those alerts, I still have a fair amount of check files created since that change.

Should I be concerned about this? 50k+ of these before I flushed the directory the other day was only a few hundred MB, not a problem. But if this is indicative of this system being overtaxed, I'd like to know.

System:

Nagios XI Version : 2012R1.6
nagios.XXXXXXX.com 2.6.32-220.el6.x86_64 x86_64
CentOS release 6.2 (Final)
Gnome is not installed
Apache Information

PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31
Server Name: nagios.XXXXXXXX.com
Server Address: XXX.XXX.XXX.XXX
Server Port: 80
Date/Time

PHP Timezone: America/Los_Angeles
PHP Time: Wed, 01 May 2013 13:34:56 -0700
System Time: Wed, 01 May 2013 13:34:56 -0700
Nagios XI Data

nagios (pid 24117) is running...
NPCD running (pid 1857).
ndo2db (pid 31575) is running...
CPU Load 15: 0.91
Total Hosts: 791
Total Services: 13518
Andrew J. - Do you even grok?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: NSClient check timeouts fixed, but tmp Check files conti

Post by slansing »

Can you catch one of these tmp files and open it? What is the output "please block sensation information." This will give us a direction to go off of and seek additional checks which may be timing out. Your system should not be the root cause of this, but rather checks not returning within the time they are allotted by their command/service definitions.
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: NSClient check timeouts fixed, but tmp Check files conti

Post by vAJ »

Code: Select all

### Active Check Result File ###
file_time=1367010987

### Nagios Service Check Result ###
# Time: Fri Apr 26 14:16:27 2013
host_name=vweb447
service_description=ASP Requests Per Second
check_type=0
check_options=0
scheduled_check=1
reschedule_check=1
latency=48.832000
start_time=1367010987.832873
Andrew J. - Do you even grok?
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: NSClient check timeouts fixed, but tmp Check files conti

Post by vAJ »

They are all NSClient Perfmon counter checks.

They were set up by the previous owner of monitoring tools and I can't find where they are defined.

They do not appear in CCM, legacy CCM or the services cfg file for any of the hosts. So I'm guessing they are part of a master service definition somewhere.
Andrew J. - Do you even grok?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: NSClient check timeouts fixed, but tmp Check files conti

Post by scottwilkerson »

Can you run the following to try to track these down

Code: Select all

grep -R "ASP Requests Per Second" /usr/local/nagios/etc/
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: NSClient check timeouts fixed, but tmp Check files conti

Post by vAJ »

Brilliant. Never thought to do that.

Found one service cfg file that defined the large host group that this was bound too. Weird thing is, the host_name in the config file is not the hostname of the service config file.

Code: Select all

define service {
        host_name                       vweb001
        service_description             ASP Requests Per Second
        use                             xiwizard_windowsserver_nsclient_service
        hostgroup_name                  vweb-servers
        check_command                   check_xi_service_nsclient!XXXXXXXXX!COUNTER!-l "\\Active Server Pages\\Requests/Sec","Requests /sec: %.f" -w 1800 -c 2500!!!!!
        max_check_attempts              5
        check_interval                  1
        retry_interval                  5
        check_period                    24x7
        notification_interval           60
        notification_period             24x7
        contact_groups                  noc
        register                        1
        }
So, does applying a service check to a large hostgroup cause performance issues?
Andrew J. - Do you even grok?
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: NSClient check timeouts fixed, but tmp Check files conti

Post by lmiltchev »

So, does applying a service check to a large hostgroup cause performance issues?
Not necessarily. It is no different than setting up this service on each host (member of this hostgroup) individually. Applying a service to a hostgroup has some advantages though - you can modify it in one place, and the service will be applied to all of the new members of the hostgroup, that you might add.
Be sure to check out our Knowledgebase for helpful articles and solutions!
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: NSClient check timeouts fixed, but tmp Check files conti

Post by vAJ »

That's what I figured. The sanity check is always appreciated!

Any ideas to determine why the check files continue to enumerate in /tmp?
Andrew J. - Do you even grok?
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: NSClient check timeouts fixed, but tmp Check files conti

Post by lmiltchev »

Do you have scripts that take a fairly long time to execute? Try following the steps, outlined on our wiki page:

http://support.nagios.com/wiki/index.ph ... ely_manner

Hope this helps.
Be sure to check out our Knowledgebase for helpful articles and solutions!
vAJ
Posts: 456
Joined: Thu Nov 08, 2012 5:09 pm
Location: Austin, TX

Re: NSClient check timeouts fixed, but tmp Check files conti

Post by vAJ »

Yes, that was already in place. Forgot to mention that in this thread, but was mentioned in the thread I hijacked earlier.

Here are my perf metrics:

Code: Select all

Host Check Latency
Min	0.00 sec	
Max	1.65 sec	
Avg	0.58 sec	
 
Host Check Execution Time
Min	0.01 sec	
Max	10.01 sec	
Avg	0.04 sec	
 
Service Check Latency
Min	0.00 sec	
Max	1.51 sec	
Avg	0.29 sec	
 
Service Check Execution Time
Min	0.01 sec	
Max	10.11 sec	
Avg	0.22 sec
Should I look to increase the timeout in /etc/init.d/nagios ?
Andrew J. - Do you even grok?
Locked