Page 2 of 4

Re: Hundreds of Active Check result files in /tmp

Posted: Mon Oct 08, 2012 1:13 pm
by jbennett
Please find attached & thank you for the assistance!

Re: Hundreds of Active Check result files in /tmp

Posted: Mon Oct 08, 2012 1:35 pm
by scottwilkerson
Out of curiosity, since the reboot has the "Monitoring Performance" normalized?

Also, what type of load is the system running at? How many CPU's does the machine have?

Re: Hundreds of Active Check result files in /tmp

Posted: Mon Oct 08, 2012 4:17 pm
by jbennett
Here's the latest monitoring performance:

Code: Select all

Monitoring Performance
Service Check Execution Time:	0.00 / 16.52 / 2.438 sec
Service Check Latency:	0.00 / 9036.62 / 1313.110 sec
Host Check Execution Time:	0.00 / 11.44 / 1.480 sec
Host Check Latency:	0.00 / 7059.46 / 1916.317 sec
# Active Host / Service Checks:	3008 / 5613
# Passive Host / Service Checks:	0 / 1
I have just deactivated 741 hosts; but, added in about 100 for a new area that is opening within the company.

I still have some of the deactivated hosts showing up as down though. Even after stopping/starting Nagios.

This machine is a VM running on 8 CPUs & 16GB memory.

Code: Select all

[root@nagiosxivm ~]# top
top - 15:52:19 up 55 min,  1 user,  load average: 6.80, 10.51, 10.44
Tasks: 248 total,   2 running, 245 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.6%us,  0.6%sy,  0.0%ni, 65.4%id, 32.6%wa,  0.1%hi,  0.6%si,  0.0%st
Mem:  15463692k total,  1486024k used, 13977668k free,   121540k buffers
Swap:   262136k total,        0k used,   262136k free,   679180k cached

Code: Select all

[root@nagiosxivm ~]# mpstat -P ALL
Linux 2.6.32-220.4.1.el6.i686 (nagiosxivm)      10/08/2012      _i686_  (8 CPU)

03:54:08 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
03:54:08 PM  all    5.43    0.00    6.06   14.90    0.23    1.36    0.00    0.00   72.02
03:54:08 PM    0    3.62    0.00    4.70   16.94    1.20    1.66    0.00    0.00   71.89
03:54:08 PM    1    6.69    0.00    6.97   12.49    0.10    0.73    0.00    0.00   73.03
03:54:08 PM    2    4.43    0.00    5.26   17.60    0.08    2.05    0.00    0.00   70.57
03:54:08 PM    3    6.70    0.00    7.15   12.96    0.11    0.60    0.00    0.00   72.48
03:54:08 PM    4    4.36    0.00    5.53   16.94    0.08    2.31    0.00    0.00   70.78
03:54:08 PM    5    6.93    0.00    7.06   13.08    0.08    0.52    0.00    0.00   72.34
03:54:08 PM    6    4.36    0.00    5.36   16.99    0.07    2.34    0.00    0.00   70.89
03:54:08 PM    7    6.43    0.00    6.53   12.02    0.10    0.62    0.00    0.00   74.30

Re: Hundreds of Active Check result files in /tmp

Posted: Tue Oct 09, 2012 10:54 am
by jbennett
I just went through and de-activated quite a few more hosts, for a total of around 1500 deactivated. I applied the change as well.

However, my active host / service checks is still showing the same count:

Code: Select all

# Active Host / Service Checks:	3008 / 5613
I'm also still seeing some of these hosts still showing up.

I am wondering if running a database repair might help?

Re: Hundreds of Active Check result files in /tmp

Posted: Tue Oct 09, 2012 11:26 am
by scottwilkerson
Can you post the output of

Code: Select all

ps -ef|grep bin/nagios

Re: Hundreds of Active Check result files in /tmp

Posted: Tue Oct 09, 2012 11:36 am
by jbennett
I am currently running a backup on the same machine, but here are the results:

Code: Select all

[root@nagiosxivm ~]# ps -ef|grep bin/nagios
nagios    9147     1  0 10:21 ?        00:00:42 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   17667  9147  0 11:36 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root     17670 17651  2 11:36 pts/1    00:00:00 grep bin/nagios

Re: Hundreds of Active Check result files in /tmp

Posted: Tue Oct 09, 2012 2:41 pm
by jbennett
I just ran the repairing the DB script per this document (http://assets.nagios.com/downloads/nagi ... tabase.pdf) and have the same results.

Re: Hundreds of Active Check result files in /tmp

Posted: Tue Oct 09, 2012 3:05 pm
by slansing
Did the repair run into any errors that you could see? I know it scrolls quite fast sometimes but sometimes it provides a good bread crumb trail to the root issue.

Re: Hundreds of Active Check result files in /tmp

Posted: Tue Oct 09, 2012 3:08 pm
by jbennett
I just scrolled up as far as I could through the process, and I didn't see any errors.

I was able to go up to:

Code: Select all

- recovering (with sort) MyISAM-table 'nagios_logentries.MYI'
Data records: 265875
- Fixing index 1
- Fixing index 2
- Fixing index 3
- Fixing index 4
I did not watch the process closely though. Should I try it once more and watch it this time around?

Re: Hundreds of Active Check result files in /tmp

Posted: Tue Oct 09, 2012 3:20 pm
by slansing
It certainly would not hurt to run it again.

I would also run the following whilst running the repair if possible:

Code: Select all

tail -f /var/log/mysqld.log