Hundreds of Active Check result files in /tmp

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Hundreds of Active Check result files in /tmp

Post by jbennett »

I'm not sure if this is normal.

I have hundreds (thousands?) of files like the following in my /tmp folder on my Nagios box:

Code: Select all

[root@nagiosxivm tmp]# vi check6ve7op
### Active Check Result File ###
file_time=1343197496

### Nagios Service Check Result ###
# Time: Wed Jul 25 06:24:56 2012
host_name=xxxxx
service_description=xxxxxxxxxx
check_type=0
check_options=4
scheduled_check=1
reschedule_check=1
latency=1.200000
start_time=1343197496.200880
Is this normal? If not, what is causing all of these files to collect in the /tmp folder?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Hundreds of Active Check result files in /tmp

Post by scottwilkerson »

It is normal for them to be there temporarily, but if they aren't from the last several minutes it was likely caused by having multiple nagios processes running sometime in the past.

It is safe to remove these

Code: Select all

rm -f /tmp/check*
If they build back up and don't remove them selves we need to investigate further as it could be related to the following wiki
http://support.nagios.com/wiki/index.ph ... g_Orphaned
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Hundreds of Active Check result files in /tmp

Post by jbennett »

When I make these changes, gradually, every single host & service comes back with a critical check.

Change it back and everything goes back to normal.

Is this to be expected at first?

I should note that I did manage to update to 3.3.
yancy
Posts: 523
Joined: Thu Oct 06, 2011 10:12 am

Re: Hundreds of Active Check result files in /tmp

Post by yancy »

jbennet,

Can you be more specific about what your changing that makes the checks return critical?

did you try the following?

Code: Select all

 rm -f /tmp/check* 

Code: Select all

  killall -9 nagios 
Then restart Nagios from the Admin menu of the web interface.

Regards,

-Yancy
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Hundreds of Active Check result files in /tmp

Post by jbennett »

The changes I'm making are those that were linked. Per the above linke:

Code: Select all

heck Services Being Orphaned
Some users have encountered large numbers of warning messages that accumulate quickly that read as follows:

Warning: The check of service <Your Service> on host <Your Host> looks like it was orphaned (results never came back). I'm scheduling an immediate check of the service..

This is most likely caused by multiple instances of Nagios running. To fix this kill all instances of Nagios and then restart the process.

 killall -9 nagios
Then restart Nagios from the Admin menu of the web interface.

Related forum post can be read here.


If the issue continues to persist after reboots and restarts of the Nagios service, then the issue is most likely caused by either a memory leak in embedded perl, or system ulimit restrictions. Symptoms can include the /tmp directory filling up quickly with check* files, and the following errors in the nagios log.

 [1331905537] Warning: The check of service 'SERVICE' on host 'NAMESERVER' looks like it WAS 
 orphaned (results never Came back). I'm scheduling an immediate check of the service ...
 [1331755699] Warning: The check of service 'SWAP' on host 'nameserver' not could be due to Performed 
 to fork () error 'Resource temporarily unavailable'. The check will be rescheduled. 
Try the following solutions:

Edit /etc/security/limits.conf

 * hard memlock 128     #locked memory
 * soft memlock 128
 * soft nofile 4096      #open files
 * hard nofile 4096
 * hard nproc 4096     #max user processes
 * soft nproc 4096
 * hard stack 20480     #stack size
 * soft stack 20480
and restart the server. Run

 ulimit -a 
to verify that the new settings are in place.


And also update the settings in your nagios.cfg file to match the following:

 enable_embedded_perl=0
 use_embedded_perl_implicitly=0
I tried making the changes and killing Nagios from both the command line (service nagios stop / killall -9 nagios / serivce nagios start) as well as a full reboot of the server and I would still end up with hundreds of hosts and services showing critical (ping times were exceptionally long and over 50% packet loss for instance).

Yes, I did clear out all of the orphaned checks per the suggestion.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Hundreds of Active Check result files in /tmp

Post by mguthrie »

How large is your install? (how many hosts + services?)

Also try increasing the restart time of Nagios by updating the following line in /etc/init.d/nagios, line 157.

Change:

Code: Select all

                for i in 1 2 3 4 5 6 7 8 9 10 ; do
To:

Code: Select all

                for i in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 ; do
This will give the nagios process plenty of time to cleanly close off any remaining checks. If you have a lot of checks that take a long time to complete, you could potentially have some temp files leftover upon a restart of the process.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Hundreds of Active Check result files in /tmp

Post by jbennett »

Code: Select all

 Monitoring Performance
Service Check Execution Time:	0.00 / 12.72 / 1.457 sec
Service Check Latency:	0.00 / 9036.63 / 3097.635 sec
Host Check Execution Time:	0.00 / 12.39 / 1.553 sec
Host Check Latency:	0.00 / 9035.63 / 3370.838 sec
#[b] Active Host / Service Checks:	2867 / 5613[/b]
# Passive Host / Service Checks:	0 / 1
This is immediately after rebooting the server this morning.

I came in to find about 300 hosts down with packet loss on ping. From my desktop, I'm able to ping without issue. One I reboot the system, it seems to resolve, for a little while anyways.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Hundreds of Active Check result files in /tmp

Post by mguthrie »

Are you using DNX or Mod Gearman on your install?
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Hundreds of Active Check result files in /tmp

Post by jbennett »

I am not using a distributed set-up. I have checked the plugin page, and do not see any plug-ins for either DNX or Gearman installed.

I did take over this install from a previous admin. This previous admin did not leave any documentation or any further information about how they set-up the system so I am having to figure it out as I go.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Hundreds of Active Check result files in /tmp

Post by lmiltchev »

Can you post your nagios.cfg file? Thank you!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked