Page 1 of 3
NSClient check timeouts fixed, but tmp Check files continue
Posted: Wed May 01, 2013 3:36 pm
by vAJ
I think I had the other thread hijacked for long enough. Time to start fresh.
I assumed that the frequent timeouts of checks to Windows hosts via NSClient were related to the growth of check* files in /tmp. However, I've modified the timeout for check_nt globally and while I no longer have those alerts, I still have a fair amount of check files created since that change.
Should I be concerned about this? 50k+ of these before I flushed the directory the other day was only a few hundred MB, not a problem. But if this is indicative of this system being overtaxed, I'd like to know.
System:
Nagios XI Version : 2012R1.6
nagios.XXXXXXX.com 2.6.32-220.el6.x86_64 x86_64
CentOS release 6.2 (Final)
Gnome is not installed
Apache Information
PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31
Server Name: nagios.XXXXXXXX.com
Server Address: XXX.XXX.XXX.XXX
Server Port: 80
Date/Time
PHP Timezone: America/Los_Angeles
PHP Time: Wed, 01 May 2013 13:34:56 -0700
System Time: Wed, 01 May 2013 13:34:56 -0700
Nagios XI Data
nagios (pid 24117) is running...
NPCD running (pid 1857).
ndo2db (pid 31575) is running...
CPU Load 15: 0.91
Total Hosts: 791
Total Services: 13518
Re: NSClient check timeouts fixed, but tmp Check files conti
Posted: Wed May 01, 2013 3:45 pm
by slansing
Can you catch one of these tmp files and open it? What is the output "please block sensation information." This will give us a direction to go off of and seek additional checks which may be timing out. Your system should not be the root cause of this, but rather checks not returning within the time they are allotted by their command/service definitions.
Re: NSClient check timeouts fixed, but tmp Check files conti
Posted: Wed May 01, 2013 4:35 pm
by vAJ
Code: Select all
### Active Check Result File ###
file_time=1367010987
### Nagios Service Check Result ###
# Time: Fri Apr 26 14:16:27 2013
host_name=vweb447
service_description=ASP Requests Per Second
check_type=0
check_options=0
scheduled_check=1
reschedule_check=1
latency=48.832000
start_time=1367010987.832873
Re: NSClient check timeouts fixed, but tmp Check files conti
Posted: Wed May 01, 2013 4:56 pm
by vAJ
They are all NSClient Perfmon counter checks.
They were set up by the previous owner of monitoring tools and I can't find where they are defined.
They do not appear in CCM, legacy CCM or the services cfg file for any of the hosts. So I'm guessing they are part of a master service definition somewhere.
Re: NSClient check timeouts fixed, but tmp Check files conti
Posted: Wed May 01, 2013 5:16 pm
by scottwilkerson
Can you run the following to try to track these down
Code: Select all
grep -R "ASP Requests Per Second" /usr/local/nagios/etc/
Re: NSClient check timeouts fixed, but tmp Check files conti
Posted: Thu May 02, 2013 9:06 am
by vAJ
Brilliant. Never thought to do that.
Found one service cfg file that defined the large host group that this was bound too. Weird thing is, the host_name in the config file is not the hostname of the service config file.
Code: Select all
define service {
host_name vweb001
service_description ASP Requests Per Second
use xiwizard_windowsserver_nsclient_service
hostgroup_name vweb-servers
check_command check_xi_service_nsclient!XXXXXXXXX!COUNTER!-l "\\Active Server Pages\\Requests/Sec","Requests /sec: %.f" -w 1800 -c 2500!!!!!
max_check_attempts 5
check_interval 1
retry_interval 5
check_period 24x7
notification_interval 60
notification_period 24x7
contact_groups noc
register 1
}
So, does applying a service check to a large hostgroup cause performance issues?
Re: NSClient check timeouts fixed, but tmp Check files conti
Posted: Thu May 02, 2013 9:58 am
by lmiltchev
So, does applying a service check to a large hostgroup cause performance issues?
Not necessarily. It is no different than setting up this service on each host (member of this hostgroup) individually. Applying a service to a hostgroup has some advantages though - you can modify it in one place, and the service will be applied to all of the new members of the hostgroup, that you might add.
Re: NSClient check timeouts fixed, but tmp Check files conti
Posted: Thu May 02, 2013 10:41 am
by vAJ
That's what I figured. The sanity check is always appreciated!
Any ideas to determine why the check files continue to enumerate in /tmp?
Re: NSClient check timeouts fixed, but tmp Check files conti
Posted: Thu May 02, 2013 10:55 am
by lmiltchev
Do you have scripts that take a fairly long time to execute? Try following the steps, outlined on our wiki page:
http://support.nagios.com/wiki/index.ph ... ely_manner
Hope this helps.
Re: NSClient check timeouts fixed, but tmp Check files conti
Posted: Thu May 02, 2013 11:05 am
by vAJ
Yes, that was already in place. Forgot to mention that in this thread, but was mentioned in the thread I hijacked earlier.
Here are my perf metrics:
Code: Select all
Host Check Latency
Min 0.00 sec
Max 1.65 sec
Avg 0.58 sec
Host Check Execution Time
Min 0.01 sec
Max 10.01 sec
Avg 0.04 sec
Service Check Latency
Min 0.00 sec
Max 1.51 sec
Avg 0.29 sec
Service Check Execution Time
Min 0.01 sec
Max 10.11 sec
Avg 0.22 sec
Should I look to increase the timeout in /etc/init.d/nagios ?