NSClient check timeouts fixed, but tmp Check files continue
NSClient check timeouts fixed, but tmp Check files continue
I think I had the other thread hijacked for long enough. Time to start fresh.
I assumed that the frequent timeouts of checks to Windows hosts via NSClient were related to the growth of check* files in /tmp. However, I've modified the timeout for check_nt globally and while I no longer have those alerts, I still have a fair amount of check files created since that change.
Should I be concerned about this? 50k+ of these before I flushed the directory the other day was only a few hundred MB, not a problem. But if this is indicative of this system being overtaxed, I'd like to know.
System:
Nagios XI Version : 2012R1.6
nagios.XXXXXXX.com 2.6.32-220.el6.x86_64 x86_64
CentOS release 6.2 (Final)
Gnome is not installed
Apache Information
PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31
Server Name: nagios.XXXXXXXX.com
Server Address: XXX.XXX.XXX.XXX
Server Port: 80
Date/Time
PHP Timezone: America/Los_Angeles
PHP Time: Wed, 01 May 2013 13:34:56 -0700
System Time: Wed, 01 May 2013 13:34:56 -0700
Nagios XI Data
nagios (pid 24117) is running...
NPCD running (pid 1857).
ndo2db (pid 31575) is running...
CPU Load 15: 0.91
Total Hosts: 791
Total Services: 13518
I assumed that the frequent timeouts of checks to Windows hosts via NSClient were related to the growth of check* files in /tmp. However, I've modified the timeout for check_nt globally and while I no longer have those alerts, I still have a fair amount of check files created since that change.
Should I be concerned about this? 50k+ of these before I flushed the directory the other day was only a few hundred MB, not a problem. But if this is indicative of this system being overtaxed, I'd like to know.
System:
Nagios XI Version : 2012R1.6
nagios.XXXXXXX.com 2.6.32-220.el6.x86_64 x86_64
CentOS release 6.2 (Final)
Gnome is not installed
Apache Information
PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31
Server Name: nagios.XXXXXXXX.com
Server Address: XXX.XXX.XXX.XXX
Server Port: 80
Date/Time
PHP Timezone: America/Los_Angeles
PHP Time: Wed, 01 May 2013 13:34:56 -0700
System Time: Wed, 01 May 2013 13:34:56 -0700
Nagios XI Data
nagios (pid 24117) is running...
NPCD running (pid 1857).
ndo2db (pid 31575) is running...
CPU Load 15: 0.91
Total Hosts: 791
Total Services: 13518
Andrew J. - Do you even grok?
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: NSClient check timeouts fixed, but tmp Check files conti
Can you catch one of these tmp files and open it? What is the output "please block sensation information." This will give us a direction to go off of and seek additional checks which may be timing out. Your system should not be the root cause of this, but rather checks not returning within the time they are allotted by their command/service definitions.
Re: NSClient check timeouts fixed, but tmp Check files conti
Code: Select all
### Active Check Result File ###
file_time=1367010987
### Nagios Service Check Result ###
# Time: Fri Apr 26 14:16:27 2013
host_name=vweb447
service_description=ASP Requests Per Second
check_type=0
check_options=0
scheduled_check=1
reschedule_check=1
latency=48.832000
start_time=1367010987.832873Andrew J. - Do you even grok?
Re: NSClient check timeouts fixed, but tmp Check files conti
They are all NSClient Perfmon counter checks.
They were set up by the previous owner of monitoring tools and I can't find where they are defined.
They do not appear in CCM, legacy CCM or the services cfg file for any of the hosts. So I'm guessing they are part of a master service definition somewhere.
They were set up by the previous owner of monitoring tools and I can't find where they are defined.
They do not appear in CCM, legacy CCM or the services cfg file for any of the hosts. So I'm guessing they are part of a master service definition somewhere.
Andrew J. - Do you even grok?
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: NSClient check timeouts fixed, but tmp Check files conti
Can you run the following to try to track these down
Code: Select all
grep -R "ASP Requests Per Second" /usr/local/nagios/etc/Re: NSClient check timeouts fixed, but tmp Check files conti
Brilliant. Never thought to do that.
Found one service cfg file that defined the large host group that this was bound too. Weird thing is, the host_name in the config file is not the hostname of the service config file.
So, does applying a service check to a large hostgroup cause performance issues?
Found one service cfg file that defined the large host group that this was bound too. Weird thing is, the host_name in the config file is not the hostname of the service config file.
Code: Select all
define service {
host_name vweb001
service_description ASP Requests Per Second
use xiwizard_windowsserver_nsclient_service
hostgroup_name vweb-servers
check_command check_xi_service_nsclient!XXXXXXXXX!COUNTER!-l "\\Active Server Pages\\Requests/Sec","Requests /sec: %.f" -w 1800 -c 2500!!!!!
max_check_attempts 5
check_interval 1
retry_interval 5
check_period 24x7
notification_interval 60
notification_period 24x7
contact_groups noc
register 1
}
Andrew J. - Do you even grok?
Re: NSClient check timeouts fixed, but tmp Check files conti
Not necessarily. It is no different than setting up this service on each host (member of this hostgroup) individually. Applying a service to a hostgroup has some advantages though - you can modify it in one place, and the service will be applied to all of the new members of the hostgroup, that you might add.So, does applying a service check to a large hostgroup cause performance issues?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: NSClient check timeouts fixed, but tmp Check files conti
That's what I figured. The sanity check is always appreciated!
Any ideas to determine why the check files continue to enumerate in /tmp?
Any ideas to determine why the check files continue to enumerate in /tmp?
Andrew J. - Do you even grok?
Re: NSClient check timeouts fixed, but tmp Check files conti
Do you have scripts that take a fairly long time to execute? Try following the steps, outlined on our wiki page:
http://support.nagios.com/wiki/index.ph ... ely_manner
Hope this helps.
http://support.nagios.com/wiki/index.ph ... ely_manner
Hope this helps.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: NSClient check timeouts fixed, but tmp Check files conti
Yes, that was already in place. Forgot to mention that in this thread, but was mentioned in the thread I hijacked earlier.
Here are my perf metrics:
Should I look to increase the timeout in /etc/init.d/nagios ?
Here are my perf metrics:
Code: Select all
Host Check Latency
Min 0.00 sec
Max 1.65 sec
Avg 0.58 sec
Host Check Execution Time
Min 0.01 sec
Max 10.01 sec
Avg 0.04 sec
Service Check Latency
Min 0.00 sec
Max 1.51 sec
Avg 0.29 sec
Service Check Execution Time
Min 0.01 sec
Max 10.11 sec
Avg 0.22 secAndrew J. - Do you even grok?