Hundreds of Active Check result files in /tmp

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Hundreds of Active Check result files in /tmp

Post by jbennett »

Please find attached & thank you for the assistance!
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Hundreds of Active Check result files in /tmp

Post by scottwilkerson »

Out of curiosity, since the reboot has the "Monitoring Performance" normalized?

Also, what type of load is the system running at? How many CPU's does the machine have?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Hundreds of Active Check result files in /tmp

Post by jbennett »

Here's the latest monitoring performance:

Code: Select all

Monitoring Performance
Service Check Execution Time:	0.00 / 16.52 / 2.438 sec
Service Check Latency:	0.00 / 9036.62 / 1313.110 sec
Host Check Execution Time:	0.00 / 11.44 / 1.480 sec
Host Check Latency:	0.00 / 7059.46 / 1916.317 sec
# Active Host / Service Checks:	3008 / 5613
# Passive Host / Service Checks:	0 / 1
I have just deactivated 741 hosts; but, added in about 100 for a new area that is opening within the company.

I still have some of the deactivated hosts showing up as down though. Even after stopping/starting Nagios.

This machine is a VM running on 8 CPUs & 16GB memory.

Code: Select all

[root@nagiosxivm ~]# top
top - 15:52:19 up 55 min,  1 user,  load average: 6.80, 10.51, 10.44
Tasks: 248 total,   2 running, 245 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.6%us,  0.6%sy,  0.0%ni, 65.4%id, 32.6%wa,  0.1%hi,  0.6%si,  0.0%st
Mem:  15463692k total,  1486024k used, 13977668k free,   121540k buffers
Swap:   262136k total,        0k used,   262136k free,   679180k cached

Code: Select all

[root@nagiosxivm ~]# mpstat -P ALL
Linux 2.6.32-220.4.1.el6.i686 (nagiosxivm)      10/08/2012      _i686_  (8 CPU)

03:54:08 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
03:54:08 PM  all    5.43    0.00    6.06   14.90    0.23    1.36    0.00    0.00   72.02
03:54:08 PM    0    3.62    0.00    4.70   16.94    1.20    1.66    0.00    0.00   71.89
03:54:08 PM    1    6.69    0.00    6.97   12.49    0.10    0.73    0.00    0.00   73.03
03:54:08 PM    2    4.43    0.00    5.26   17.60    0.08    2.05    0.00    0.00   70.57
03:54:08 PM    3    6.70    0.00    7.15   12.96    0.11    0.60    0.00    0.00   72.48
03:54:08 PM    4    4.36    0.00    5.53   16.94    0.08    2.31    0.00    0.00   70.78
03:54:08 PM    5    6.93    0.00    7.06   13.08    0.08    0.52    0.00    0.00   72.34
03:54:08 PM    6    4.36    0.00    5.36   16.99    0.07    2.34    0.00    0.00   70.89
03:54:08 PM    7    6.43    0.00    6.53   12.02    0.10    0.62    0.00    0.00   74.30
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Hundreds of Active Check result files in /tmp

Post by jbennett »

I just went through and de-activated quite a few more hosts, for a total of around 1500 deactivated. I applied the change as well.

However, my active host / service checks is still showing the same count:

Code: Select all

# Active Host / Service Checks:	3008 / 5613
I'm also still seeing some of these hosts still showing up.

I am wondering if running a database repair might help?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Hundreds of Active Check result files in /tmp

Post by scottwilkerson »

Can you post the output of

Code: Select all

ps -ef|grep bin/nagios
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Hundreds of Active Check result files in /tmp

Post by jbennett »

I am currently running a backup on the same machine, but here are the results:

Code: Select all

[root@nagiosxivm ~]# ps -ef|grep bin/nagios
nagios    9147     1  0 10:21 ?        00:00:42 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios   17667  9147  0 11:36 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root     17670 17651  2 11:36 pts/1    00:00:00 grep bin/nagios
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Hundreds of Active Check result files in /tmp

Post by jbennett »

I just ran the repairing the DB script per this document (http://assets.nagios.com/downloads/nagi ... tabase.pdf) and have the same results.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Hundreds of Active Check result files in /tmp

Post by slansing »

Did the repair run into any errors that you could see? I know it scrolls quite fast sometimes but sometimes it provides a good bread crumb trail to the root issue.
jbennett
Posts: 522
Joined: Mon Apr 16, 2012 3:00 pm

Re: Hundreds of Active Check result files in /tmp

Post by jbennett »

I just scrolled up as far as I could through the process, and I didn't see any errors.

I was able to go up to:

Code: Select all

- recovering (with sort) MyISAM-table 'nagios_logentries.MYI'
Data records: 265875
- Fixing index 1
- Fixing index 2
- Fixing index 3
- Fixing index 4
I did not watch the process closely though. Should I try it once more and watch it this time around?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Hundreds of Active Check result files in /tmp

Post by slansing »

It certainly would not hurt to run it again.

I would also run the following whilst running the repair if possible:

Code: Select all

tail -f /var/log/mysqld.log
Locked