Page 2 of 5

Re: Performance Issues / fork() errors

Posted: Tue Jan 29, 2013 5:34 pm
by slansing
This error is the same as the one you described above correct? You get "Cannot Connect to The Database" When accessing the CCM? Have you check this link out?:

http://support.nagios.com/wiki/index.ph ... ig_Manager

Re: Performance Issues / fork() errors

Posted: Fri Feb 01, 2013 8:52 am
by Gavin
That link seems to be more to do with a complete lack of access to CCM. Our issue is intermittent.

Bizarrely, the load seems to stay at 20/30 for hours, and then sit quite happily under 10 for hours. I can't see any one process dominating the rest, but I'm also in a position where turning on verbose logging will negatively impact performance because of I/O.

Is there some sort of debug log I can switch on, get the CCM error to appear, and then send to you? If it's load related, so be it, but this part of Nagios seems to be more sensitive to load than anything else.

Many thanks,

Gavin

Re: Performance Issues / fork() errors

Posted: Fri Feb 01, 2013 10:11 am
by mguthrie
Try upgrading to XI 1.5, we added some fixes in hopes to resolve the intermittent error with the CCM.

Re: Performance Issues / fork() errors

Posted: Fri Feb 01, 2013 11:12 am
by chrisp
I couldn't see anything in the change log directly relating to our issues, but I'm all for staying up-to-date.

We're now on: Nagios XI 2012R1.5

Re: Performance Issues / fork() errors

Posted: Fri Feb 01, 2013 12:37 pm
by abrist
Great. Is XI still exhibiting the same behavior?

Re: Performance Issues / fork() errors

Posted: Fri Feb 01, 2013 1:59 pm
by Gavin
Unfortunately, yes. It's a really annoying error because you can't use the back button either.

Thanks

Gavin

Re: Performance Issues / fork() errors

Posted: Fri Feb 01, 2013 3:47 pm
by mguthrie
Are you seeing any entries in the apache log related to this? We added some debugging info in 1.5 to give some more clues if this issue shows up.

Re: Performance Issues / fork() errors

Posted: Thu Feb 14, 2013 8:23 am
by chrisp
Hi All,

I have an update and some questions...

UPDATE: -

We've moved to a new dedicated server, with SSDs instead of slow HDDs. Our I/O is way better and we have not seen the "Cannot Connect to The Database" when accessing the CCM since.

However, we are experiencing problems with something called “grsec” seems to be preventing our desired programs from working properly.

We’re seeing loads of entries like this in /var/log/messages: -

Code: Select all

Feb 14 12:48:23 Nagios kernel: grsec: failed fork with errno EAGAIN by /usr/local/nagios/bin/nagios[nagios:4525] uid/euid:508/508 gid/egid:100/100, parent /sbin/init[init:1] uid/euid:0/0 gid/egid:0/0
Feb 14 12:48:23 Nagios kernel: grsec: failed fork with errno EAGAIN by /usr/local/nagios/bin/nagios[nagios:595] uid/euid:508/508 gid/egid:100/100, parent /usr/local/nagios/bin/nagios[nagios:4525] uid/euid:508/508 gid/egid:100/100
Feb 14 12:48:23 Nagios kernel: grsec: failed fork with errno EAGAIN by /usr/local/nagios/bin/nagios[nagios:601] uid/euid:508/508 gid/egid:100/100, parent /usr/local/nagios/bin/nagios[nagios:4525] uid/euid:508/508 gid/egid:100/100
These are followed by the Nagios system logging a problem like so: -

Code: Select all

Feb 14 12:48:23 Nagios nagios: Warning: The check of service 'someservice' on host 'somehost' could not be performed due to a fork() error: 'Resource temporarily unavailable'.  The check will be rescheduled.
Q1. Is "grsec" something that NagiosXi installed and can we configire it, or is there something else we can do to stop this happening?

I note that /var/log/messages is full of Nagios messages, which seem to be duplicates of those also being written to /usr/local/nagios/var/nagios.log. That seems really inefficient with regard to I/O and somewhat redundant.

I can also see that these message log entries have also triggered imuxsock rate limiting & can see log entries like this (PID 4525 was nagios): -

Code: Select all

Feb 14 12:18:08 Nagios rsyslogd-2177: imuxsock begins to drop messages from pid 4525 due to rate-limiting
Feb 14 12:18:20 Nagios rsyslogd-2177: imuxsock lost 90 messages from pid 4525 due to rate-limiting
I Googled and found [rol=http://www.rsyslog.com/tag/rate-limiting/]this[url] which allowed me to alter the rates, so that it didn't hamper my debugging any more.

Q2. Is there something we can do to stop that additional logging to /var/log/messages?

I was also wondering if we needed to upgrade the system, just in case there were updates/fixes for the problem stuff, so I had a look at what was available with "yum update": -

Code: Select all

================================================================================================================================================
 Package                                  Arch                   Version                                          Repository               Size
================================================================================================================================================
Updating:
 bind                                     x86_64                 32:9.8.2-0.10.rc1.el6_3.6                        updates                 4.0 M
 bind-chroot                              x86_64                 32:9.8.2-0.10.rc1.el6_3.6                        updates                  70 k
 bind-libs                                x86_64                 32:9.8.2-0.10.rc1.el6_3.6                        updates                 871 k
 bind-utils                               x86_64                 32:9.8.2-0.10.rc1.el6_3.6                        updates                 182 k
 cpio                                     x86_64                 2.10-11.el6_3                                    updates                 192 k
 dbus                                     x86_64                 1:1.2.24-7.el6_3                                 updates                 207 k
 dbus-libs                                x86_64                 1:1.2.24-7.el6_3                                 updates                 127 k
 device-mapper                            x86_64                 1.02.74-10.el6_3.3                               updates                 135 k
 device-mapper-event                      x86_64                 1.02.74-10.el6_3.3                               updates                  88 k
 device-mapper-event-libs                 x86_64                 1.02.74-10.el6_3.3                               updates                  83 k
 device-mapper-libs                       x86_64                 1.02.74-10.el6_3.3                               updates                 163 k
 dhclient                                 x86_64                 12:4.1.1-31.0.1.P1.el6.centos.1                  updates                 317 k
 dhcp-common                              x86_64                 12:4.1.1-31.0.1.P1.el6.centos.1                  updates                 141 k
 epel-release                             noarch                 6-8                                              epel                     14 k
 initscripts                              x86_64                 9.03.31-2.el6.centos.1                           updates                 935 k
 irqbalance                               x86_64                 2:0.55-35.el6_3                                  updates                  25 k
 libblkid                                 x86_64                 2.17.2-12.7.el6_3                                updates                 112 k
 libuuid                                  x86_64                 2.17.2-12.7.el6_3                                updates                  65 k
 lvm2                                     x86_64                 2.02.95-10.el6_3.3                               updates                 615 k
 lvm2-libs                                x86_64                 2.02.95-10.el6_3.3                               updates                 680 k
 nspr                                     x86_64                 4.9.2-0.el6_3.1                                  updates                 111 k
 nss                                      x86_64                 3.13.6-2.el6_3                                   updates                 770 k
 nss-sysinit                              x86_64                 3.13.6-2.el6_3                                   updates                  32 k
 nss-tools                                x86_64                 3.13.6-2.el6_3                                   updates                 730 k
 nss-util                                 x86_64                 3.13.6-1.el6_3                                   updates                  53 k
 openssh                                  x86_64                 5.3p1-81.el6_3                                   updates                 236 k
 openssh-clients                          x86_64                 5.3p1-81.el6_3                                   updates                 358 k
 openssh-server                           x86_64                 5.3p1-81.el6_3                                   updates                 300 k
 psacct                                   x86_64                 6.3.2-63.el6_3.3                                 updates                  68 k
 python                                   x86_64                 2.6.6-29.el6_3.3                                 updates                 4.8 M
 python-libs                              x86_64                 2.6.6-29.el6_3.3                                 updates                 623 k
 redhat-logos                             noarch                 60.0.14-12.el6.centos                            updates                  15 M
 selinux-policy                           noarch                 3.7.19-155.el6_3.14                              updates                 1.3 M
 selinux-policy-targeted                  noarch                 3.7.19-155.el6_3.14                              updates                 2.6 M
 strace                                   x86_64                 4.5.19-1.11.el6_3.2                              updates                 171 k
 sudo                                     x86_64                 1.7.4p5-13.el6_3                                 updates                 423 k
 tzdata                                   noarch                 2012j-1.el6                                      updates                 453 k
 util-linux-ng                            x86_64                 2.17.2-12.7.el6_3                                updates                 1.5 M

Transaction Summary
================================================================================================================================================
Upgrade      38 Package(s)
Q3. I said no to the update, but is this something which we should be doing periodically?

Cheers,
--
ChrisP

Re: Performance Issues / fork() errors

Posted: Thu Feb 14, 2013 10:24 am
by mguthrie
You might be hitting some ulimits on the system:
http://support.nagios.com/wiki/index.ph ... g_Orphaned

As for the redundant logging, you can turn off logging to syslog by setting:

Code: Select all

use_syslog=0
in the main nagios.cfg file.

Re: Performance Issues / fork() errors

Posted: Thu Feb 14, 2013 10:45 am
by chrisp
Thanks,

Redundant logging => OFF!

Right, I also called out to the hosting service provider, just in case grsec was their doing (the CentOS 6.3 build is via their automated provisioning system). They responded thusly: -
Grsec is extra security is added to the kernel, see http://en.wikipedia.org/wiki/Grsec

If you do not want this option then you would need to install your own kernel.
Am I OK to do that & also "yum update" while I'm at it?

The following are available to me: -

Code: Select all

# yum list >yl ; grep -i "^kernel" yl 
kernel-headers.x86_64                    2.6.32-279.22.1.el6            @updates
kernel.x86_64                            2.6.32-279.22.1.el6            updates 
kernel-debug.x86_64                      2.6.32-279.22.1.el6            updates 
kernel-debug-devel.x86_64                2.6.32-279.22.1.el6            updates 
kernel-devel.x86_64                      2.6.32-279.22.1.el6            updates 
kernel-doc.noarch                        2.6.32-279.22.1.el6            updates 
kernel-firmware.noarch                   2.6.32-279.22.1.el6            updates 
kerneloops.x86_64                        0.11-1.el6.rf                  rpmforge