Extremely high load spikes - rebuild database?

ira · Post by **ira** » Thu May 07, 2015 6:41 pm

Hi there,

I've got a liscensed version but I can't seem to post in the official support forum..

I'm having high load spikes.

At the start of each spike I'm getting:

Runtime Warning2015-05-08 07:01:29Warning: Host performance data file processing command '/bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/xidpe/1431032481.perfdata.host' timed out after 5 seconds

output "from cat /usr/local/nagios/etc/nagios.cfg | grep 'broker'"

broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
event_broker_options=-1

---

A sanitized system profile:

Nagios XI Version : 2014R2.7
nagios 2.6.32-504.12.2.el6.i686 i686
CentOS release 6.6 (Final)
Gnome is not installed
Apache Information

PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36
Server Name: developer
Server Address: 192.168.x.x
Server Port: 80
Date/Time

Nagios XI Data

License ends in: STUORM

nagios (pid 23408) is running...
NPCD running (pid 1514).
ndo2db (pid 1592) is running...
CPU Load 15: 2.42
Total Hosts: 61
Total Services: 343
Function 'get_base_uri' returns: http://developer/nagiosxi/
Function 'get_base_url' returns: http://developer/nagiosxi/
Function 'get_backend_url(internal_call=false)' returns: http://developer/nagiosxi/includes/comp ... rofile.php
Function 'get_backend_url(internal_call=true)' returns: http://localhost/nagiosxi/backend/
Ping Test localhost

Running:
/bin/ping -c 3 localhost 2>&1
PING localhost.localdomain (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost.localdomain (127.0.0.1): icmp_seq=1 ttl=64 time=0.044 ms
64 bytes from localhost.localdomain (127.0.0.1): icmp_seq=2 ttl=64 time=0.035 ms
64 bytes from localhost.localdomain (127.0.0.1): icmp_seq=3 ttl=64 time=0.039 ms

--- localhost.localdomain ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.035/0.039/0.044/0.006 ms
Test wget To localhost

WGET From URL: http://localhost/nagiosxi/includes/components/ccm/
Running:
/usr/bin/wget http://localhost/nagiosxi/includes/components/ccm/
--2015-05-08 09:38:27-- http://localhost/nagiosxi/includes/components/ccm/
Resolving localhost... ::1, 127.0.0.1
Connecting to localhost|::1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: "/usr/local/nagiosxi/tmp/ccm_index.tmp"

0K ........ 22.7M=0s

2015-05-08 09:38:28 (22.7 MB/s) - "/usr/local/nagiosxi/tmp/ccm_index.tmp" saved [8385]

Post by **lmiltchev** » Fri May 08, 2015 9:36 am

Have you checked to see what process has the highest CPU usage?

Code: Select all

top | head -15

Do you have any errors in the mysqld.log (crashed tables)?

Code: Select all

tail -20 /var/log/mysqld.log

ira · Post by **ira** » Sun May 10, 2015 6:10 pm

mysqld.log is showing the following error:

Code: Select all

150509  7:18:51 [Warning] Disk is full writing '/tmp/SThOiQty' (Errcode: 28). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)
150509  7:18:51 [Warning] Retry in 60 secs. Message reprinted in 600 secs
150510  7:17:08 [Warning] Disk is full writing './nagios/nagios_systemcommands.MYD' (Errcode: 28). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)
150510  7:17:08 [Warning] Retry in 60 secs. Message reprinted in 600 secs
150511  7:17:14 [Warning] Disk is full writing './nagios/nagios_contactnotificationmethods.TMD' (Errcode: 28). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)

But having a look at the disk for /tmp and the database store at"/var/lib/mysql/nagios/":

Code: Select all

root@nagios:~ $ df -P /var/lib/mysql/nagios/ | tail -1 | cut -d' ' -f 1
/dev/mapper/VolGroup-lv_root

Follow up:

Code: Select all

root@nagios:~ $ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VolGroup-lv_root
                       27G   24G  2.5G  91% /
tmpfs                 1.3G     0  1.3G   0% /dev/shm
/dev/sda1             477M  110M  342M  25% /boot

Now there's 2.5G free on "/dev/mapper/VolGroup-lv_root", that seems like plenty of space. And it's not an inodes issue:

Code: Select all

root@nagios:~ $ df -i
Filesystem            Inodes  IUsed   IFree IUse% Mounted on
/dev/mapper/VolGroup-lv_root
                     1792752 115783 1676969    7% /
tmpfs                 185171      1  185170    1% /dev/shm
/dev/sda1             128016     64  127952    1% /boot

I'll try to see what is causing the CPU spike when it occurs next.

Post by **lmiltchev** » Mon May 11, 2015 9:33 am

How large is the database? You may need twice as much space as the size of the database. The "Disk is full" message in the log is quite clear. You will need to add more disk space.

Nagios Support Forum

Extremely high load spikes - rebuild database?

Extremely high load spikes - rebuild database?

Re: Extremely high load spikes - rebuild database?

Re: Extremely high load spikes - rebuild database?

Re: Extremely high load spikes - rebuild database?