Performance Issues / fork() errors

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Performance Issues / fork() errors

Post by slansing »

This error is the same as the one you described above correct? You get "Cannot Connect to The Database" When accessing the CCM? Have you check this link out?:

http://support.nagios.com/wiki/index.ph ... ig_Manager
Gavin
Posts: 58
Joined: Mon Dec 24, 2012 4:56 am

Re: Performance Issues / fork() errors

Post by Gavin »

That link seems to be more to do with a complete lack of access to CCM. Our issue is intermittent.

Bizarrely, the load seems to stay at 20/30 for hours, and then sit quite happily under 10 for hours. I can't see any one process dominating the rest, but I'm also in a position where turning on verbose logging will negatively impact performance because of I/O.

Is there some sort of debug log I can switch on, get the CCM error to appear, and then send to you? If it's load related, so be it, but this part of Nagios seems to be more sensitive to load than anything else.

Many thanks,

Gavin
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Performance Issues / fork() errors

Post by mguthrie »

Try upgrading to XI 1.5, we added some fixes in hopes to resolve the intermittent error with the CCM.
User avatar
chrisp
Posts: 71
Joined: Fri Dec 28, 2012 11:35 am

Re: Performance Issues / fork() errors

Post by chrisp »

I couldn't see anything in the change log directly relating to our issues, but I'm all for staying up-to-date.

We're now on: Nagios XI 2012R1.5
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Performance Issues / fork() errors

Post by abrist »

Great. Is XI still exhibiting the same behavior?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Gavin
Posts: 58
Joined: Mon Dec 24, 2012 4:56 am

Re: Performance Issues / fork() errors

Post by Gavin »

Unfortunately, yes. It's a really annoying error because you can't use the back button either.

Thanks

Gavin
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Performance Issues / fork() errors

Post by mguthrie »

Are you seeing any entries in the apache log related to this? We added some debugging info in 1.5 to give some more clues if this issue shows up.
User avatar
chrisp
Posts: 71
Joined: Fri Dec 28, 2012 11:35 am

Re: Performance Issues / fork() errors

Post by chrisp »

Hi All,

I have an update and some questions...

UPDATE: -

We've moved to a new dedicated server, with SSDs instead of slow HDDs. Our I/O is way better and we have not seen the "Cannot Connect to The Database" when accessing the CCM since.

However, we are experiencing problems with something called “grsec” seems to be preventing our desired programs from working properly.

We’re seeing loads of entries like this in /var/log/messages: -

Code: Select all

Feb 14 12:48:23 Nagios kernel: grsec: failed fork with errno EAGAIN by /usr/local/nagios/bin/nagios[nagios:4525] uid/euid:508/508 gid/egid:100/100, parent /sbin/init[init:1] uid/euid:0/0 gid/egid:0/0
Feb 14 12:48:23 Nagios kernel: grsec: failed fork with errno EAGAIN by /usr/local/nagios/bin/nagios[nagios:595] uid/euid:508/508 gid/egid:100/100, parent /usr/local/nagios/bin/nagios[nagios:4525] uid/euid:508/508 gid/egid:100/100
Feb 14 12:48:23 Nagios kernel: grsec: failed fork with errno EAGAIN by /usr/local/nagios/bin/nagios[nagios:601] uid/euid:508/508 gid/egid:100/100, parent /usr/local/nagios/bin/nagios[nagios:4525] uid/euid:508/508 gid/egid:100/100
These are followed by the Nagios system logging a problem like so: -

Code: Select all

Feb 14 12:48:23 Nagios nagios: Warning: The check of service 'someservice' on host 'somehost' could not be performed due to a fork() error: 'Resource temporarily unavailable'.  The check will be rescheduled.
Q1. Is "grsec" something that NagiosXi installed and can we configire it, or is there something else we can do to stop this happening?

I note that /var/log/messages is full of Nagios messages, which seem to be duplicates of those also being written to /usr/local/nagios/var/nagios.log. That seems really inefficient with regard to I/O and somewhat redundant.

I can also see that these message log entries have also triggered imuxsock rate limiting & can see log entries like this (PID 4525 was nagios): -

Code: Select all

Feb 14 12:18:08 Nagios rsyslogd-2177: imuxsock begins to drop messages from pid 4525 due to rate-limiting
Feb 14 12:18:20 Nagios rsyslogd-2177: imuxsock lost 90 messages from pid 4525 due to rate-limiting
I Googled and found [rol=http://www.rsyslog.com/tag/rate-limiting/]this[url] which allowed me to alter the rates, so that it didn't hamper my debugging any more.

Q2. Is there something we can do to stop that additional logging to /var/log/messages?

I was also wondering if we needed to upgrade the system, just in case there were updates/fixes for the problem stuff, so I had a look at what was available with "yum update": -

Code: Select all

================================================================================================================================================
 Package                                  Arch                   Version                                          Repository               Size
================================================================================================================================================
Updating:
 bind                                     x86_64                 32:9.8.2-0.10.rc1.el6_3.6                        updates                 4.0 M
 bind-chroot                              x86_64                 32:9.8.2-0.10.rc1.el6_3.6                        updates                  70 k
 bind-libs                                x86_64                 32:9.8.2-0.10.rc1.el6_3.6                        updates                 871 k
 bind-utils                               x86_64                 32:9.8.2-0.10.rc1.el6_3.6                        updates                 182 k
 cpio                                     x86_64                 2.10-11.el6_3                                    updates                 192 k
 dbus                                     x86_64                 1:1.2.24-7.el6_3                                 updates                 207 k
 dbus-libs                                x86_64                 1:1.2.24-7.el6_3                                 updates                 127 k
 device-mapper                            x86_64                 1.02.74-10.el6_3.3                               updates                 135 k
 device-mapper-event                      x86_64                 1.02.74-10.el6_3.3                               updates                  88 k
 device-mapper-event-libs                 x86_64                 1.02.74-10.el6_3.3                               updates                  83 k
 device-mapper-libs                       x86_64                 1.02.74-10.el6_3.3                               updates                 163 k
 dhclient                                 x86_64                 12:4.1.1-31.0.1.P1.el6.centos.1                  updates                 317 k
 dhcp-common                              x86_64                 12:4.1.1-31.0.1.P1.el6.centos.1                  updates                 141 k
 epel-release                             noarch                 6-8                                              epel                     14 k
 initscripts                              x86_64                 9.03.31-2.el6.centos.1                           updates                 935 k
 irqbalance                               x86_64                 2:0.55-35.el6_3                                  updates                  25 k
 libblkid                                 x86_64                 2.17.2-12.7.el6_3                                updates                 112 k
 libuuid                                  x86_64                 2.17.2-12.7.el6_3                                updates                  65 k
 lvm2                                     x86_64                 2.02.95-10.el6_3.3                               updates                 615 k
 lvm2-libs                                x86_64                 2.02.95-10.el6_3.3                               updates                 680 k
 nspr                                     x86_64                 4.9.2-0.el6_3.1                                  updates                 111 k
 nss                                      x86_64                 3.13.6-2.el6_3                                   updates                 770 k
 nss-sysinit                              x86_64                 3.13.6-2.el6_3                                   updates                  32 k
 nss-tools                                x86_64                 3.13.6-2.el6_3                                   updates                 730 k
 nss-util                                 x86_64                 3.13.6-1.el6_3                                   updates                  53 k
 openssh                                  x86_64                 5.3p1-81.el6_3                                   updates                 236 k
 openssh-clients                          x86_64                 5.3p1-81.el6_3                                   updates                 358 k
 openssh-server                           x86_64                 5.3p1-81.el6_3                                   updates                 300 k
 psacct                                   x86_64                 6.3.2-63.el6_3.3                                 updates                  68 k
 python                                   x86_64                 2.6.6-29.el6_3.3                                 updates                 4.8 M
 python-libs                              x86_64                 2.6.6-29.el6_3.3                                 updates                 623 k
 redhat-logos                             noarch                 60.0.14-12.el6.centos                            updates                  15 M
 selinux-policy                           noarch                 3.7.19-155.el6_3.14                              updates                 1.3 M
 selinux-policy-targeted                  noarch                 3.7.19-155.el6_3.14                              updates                 2.6 M
 strace                                   x86_64                 4.5.19-1.11.el6_3.2                              updates                 171 k
 sudo                                     x86_64                 1.7.4p5-13.el6_3                                 updates                 423 k
 tzdata                                   noarch                 2012j-1.el6                                      updates                 453 k
 util-linux-ng                            x86_64                 2.17.2-12.7.el6_3                                updates                 1.5 M

Transaction Summary
================================================================================================================================================
Upgrade      38 Package(s)
Q3. I said no to the update, but is this something which we should be doing periodically?

Cheers,
--
ChrisP
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Performance Issues / fork() errors

Post by mguthrie »

You might be hitting some ulimits on the system:
http://support.nagios.com/wiki/index.ph ... g_Orphaned

As for the redundant logging, you can turn off logging to syslog by setting:

Code: Select all

use_syslog=0
in the main nagios.cfg file.
User avatar
chrisp
Posts: 71
Joined: Fri Dec 28, 2012 11:35 am

Re: Performance Issues / fork() errors

Post by chrisp »

Thanks,

Redundant logging => OFF!

Right, I also called out to the hosting service provider, just in case grsec was their doing (the CentOS 6.3 build is via their automated provisioning system). They responded thusly: -
Grsec is extra security is added to the kernel, see http://en.wikipedia.org/wiki/Grsec

If you do not want this option then you would need to install your own kernel.
Am I OK to do that & also "yum update" while I'm at it?

The following are available to me: -

Code: Select all

# yum list >yl ; grep -i "^kernel" yl 
kernel-headers.x86_64                    2.6.32-279.22.1.el6            @updates
kernel.x86_64                            2.6.32-279.22.1.el6            updates 
kernel-debug.x86_64                      2.6.32-279.22.1.el6            updates 
kernel-debug-devel.x86_64                2.6.32-279.22.1.el6            updates 
kernel-devel.x86_64                      2.6.32-279.22.1.el6            updates 
kernel-doc.noarch                        2.6.32-279.22.1.el6            updates 
kernel-firmware.noarch                   2.6.32-279.22.1.el6            updates 
kerneloops.x86_64                        0.11-1.el6.rf                  rpmforge
Locked