Performance Issues / fork() errors
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Performance Issues / fork() errors
This error is the same as the one you described above correct? You get "Cannot Connect to The Database" When accessing the CCM? Have you check this link out?:
http://support.nagios.com/wiki/index.ph ... ig_Manager
http://support.nagios.com/wiki/index.ph ... ig_Manager
Re: Performance Issues / fork() errors
That link seems to be more to do with a complete lack of access to CCM. Our issue is intermittent.
Bizarrely, the load seems to stay at 20/30 for hours, and then sit quite happily under 10 for hours. I can't see any one process dominating the rest, but I'm also in a position where turning on verbose logging will negatively impact performance because of I/O.
Is there some sort of debug log I can switch on, get the CCM error to appear, and then send to you? If it's load related, so be it, but this part of Nagios seems to be more sensitive to load than anything else.
Many thanks,
Gavin
Bizarrely, the load seems to stay at 20/30 for hours, and then sit quite happily under 10 for hours. I can't see any one process dominating the rest, but I'm also in a position where turning on verbose logging will negatively impact performance because of I/O.
Is there some sort of debug log I can switch on, get the CCM error to appear, and then send to you? If it's load related, so be it, but this part of Nagios seems to be more sensitive to load than anything else.
Many thanks,
Gavin
Re: Performance Issues / fork() errors
Try upgrading to XI 1.5, we added some fixes in hopes to resolve the intermittent error with the CCM.
Re: Performance Issues / fork() errors
I couldn't see anything in the change log directly relating to our issues, but I'm all for staying up-to-date.
We're now on: Nagios XI 2012R1.5
We're now on: Nagios XI 2012R1.5
Re: Performance Issues / fork() errors
Great. Is XI still exhibiting the same behavior?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Performance Issues / fork() errors
Unfortunately, yes. It's a really annoying error because you can't use the back button either.
Thanks
Gavin
Thanks
Gavin
Re: Performance Issues / fork() errors
Are you seeing any entries in the apache log related to this? We added some debugging info in 1.5 to give some more clues if this issue shows up.
Re: Performance Issues / fork() errors
Hi All,
I have an update and some questions...
UPDATE: -
We've moved to a new dedicated server, with SSDs instead of slow HDDs. Our I/O is way better and we have not seen the "Cannot Connect to The Database" when accessing the CCM since.
However, we are experiencing problems with something called “grsec” seems to be preventing our desired programs from working properly.
We’re seeing loads of entries like this in /var/log/messages: -
These are followed by the Nagios system logging a problem like so: -
Q1. Is "grsec" something that NagiosXi installed and can we configire it, or is there something else we can do to stop this happening?
I note that /var/log/messages is full of Nagios messages, which seem to be duplicates of those also being written to /usr/local/nagios/var/nagios.log. That seems really inefficient with regard to I/O and somewhat redundant.
I can also see that these message log entries have also triggered imuxsock rate limiting & can see log entries like this (PID 4525 was nagios): -
I Googled and found [rol=http://www.rsyslog.com/tag/rate-limiting/]this[url] which allowed me to alter the rates, so that it didn't hamper my debugging any more.
Q2. Is there something we can do to stop that additional logging to /var/log/messages?
I was also wondering if we needed to upgrade the system, just in case there were updates/fixes for the problem stuff, so I had a look at what was available with "yum update": -
Q3. I said no to the update, but is this something which we should be doing periodically?
Cheers,
--
ChrisP
I have an update and some questions...
UPDATE: -
We've moved to a new dedicated server, with SSDs instead of slow HDDs. Our I/O is way better and we have not seen the "Cannot Connect to The Database" when accessing the CCM since.
However, we are experiencing problems with something called “grsec” seems to be preventing our desired programs from working properly.
We’re seeing loads of entries like this in /var/log/messages: -
Code: Select all
Feb 14 12:48:23 Nagios kernel: grsec: failed fork with errno EAGAIN by /usr/local/nagios/bin/nagios[nagios:4525] uid/euid:508/508 gid/egid:100/100, parent /sbin/init[init:1] uid/euid:0/0 gid/egid:0/0
Feb 14 12:48:23 Nagios kernel: grsec: failed fork with errno EAGAIN by /usr/local/nagios/bin/nagios[nagios:595] uid/euid:508/508 gid/egid:100/100, parent /usr/local/nagios/bin/nagios[nagios:4525] uid/euid:508/508 gid/egid:100/100
Feb 14 12:48:23 Nagios kernel: grsec: failed fork with errno EAGAIN by /usr/local/nagios/bin/nagios[nagios:601] uid/euid:508/508 gid/egid:100/100, parent /usr/local/nagios/bin/nagios[nagios:4525] uid/euid:508/508 gid/egid:100/100
Code: Select all
Feb 14 12:48:23 Nagios nagios: Warning: The check of service 'someservice' on host 'somehost' could not be performed due to a fork() error: 'Resource temporarily unavailable'. The check will be rescheduled.
I note that /var/log/messages is full of Nagios messages, which seem to be duplicates of those also being written to /usr/local/nagios/var/nagios.log. That seems really inefficient with regard to I/O and somewhat redundant.
I can also see that these message log entries have also triggered imuxsock rate limiting & can see log entries like this (PID 4525 was nagios): -
Code: Select all
Feb 14 12:18:08 Nagios rsyslogd-2177: imuxsock begins to drop messages from pid 4525 due to rate-limiting
Feb 14 12:18:20 Nagios rsyslogd-2177: imuxsock lost 90 messages from pid 4525 due to rate-limiting
Q2. Is there something we can do to stop that additional logging to /var/log/messages?
I was also wondering if we needed to upgrade the system, just in case there were updates/fixes for the problem stuff, so I had a look at what was available with "yum update": -
Code: Select all
================================================================================================================================================
Package Arch Version Repository Size
================================================================================================================================================
Updating:
bind x86_64 32:9.8.2-0.10.rc1.el6_3.6 updates 4.0 M
bind-chroot x86_64 32:9.8.2-0.10.rc1.el6_3.6 updates 70 k
bind-libs x86_64 32:9.8.2-0.10.rc1.el6_3.6 updates 871 k
bind-utils x86_64 32:9.8.2-0.10.rc1.el6_3.6 updates 182 k
cpio x86_64 2.10-11.el6_3 updates 192 k
dbus x86_64 1:1.2.24-7.el6_3 updates 207 k
dbus-libs x86_64 1:1.2.24-7.el6_3 updates 127 k
device-mapper x86_64 1.02.74-10.el6_3.3 updates 135 k
device-mapper-event x86_64 1.02.74-10.el6_3.3 updates 88 k
device-mapper-event-libs x86_64 1.02.74-10.el6_3.3 updates 83 k
device-mapper-libs x86_64 1.02.74-10.el6_3.3 updates 163 k
dhclient x86_64 12:4.1.1-31.0.1.P1.el6.centos.1 updates 317 k
dhcp-common x86_64 12:4.1.1-31.0.1.P1.el6.centos.1 updates 141 k
epel-release noarch 6-8 epel 14 k
initscripts x86_64 9.03.31-2.el6.centos.1 updates 935 k
irqbalance x86_64 2:0.55-35.el6_3 updates 25 k
libblkid x86_64 2.17.2-12.7.el6_3 updates 112 k
libuuid x86_64 2.17.2-12.7.el6_3 updates 65 k
lvm2 x86_64 2.02.95-10.el6_3.3 updates 615 k
lvm2-libs x86_64 2.02.95-10.el6_3.3 updates 680 k
nspr x86_64 4.9.2-0.el6_3.1 updates 111 k
nss x86_64 3.13.6-2.el6_3 updates 770 k
nss-sysinit x86_64 3.13.6-2.el6_3 updates 32 k
nss-tools x86_64 3.13.6-2.el6_3 updates 730 k
nss-util x86_64 3.13.6-1.el6_3 updates 53 k
openssh x86_64 5.3p1-81.el6_3 updates 236 k
openssh-clients x86_64 5.3p1-81.el6_3 updates 358 k
openssh-server x86_64 5.3p1-81.el6_3 updates 300 k
psacct x86_64 6.3.2-63.el6_3.3 updates 68 k
python x86_64 2.6.6-29.el6_3.3 updates 4.8 M
python-libs x86_64 2.6.6-29.el6_3.3 updates 623 k
redhat-logos noarch 60.0.14-12.el6.centos updates 15 M
selinux-policy noarch 3.7.19-155.el6_3.14 updates 1.3 M
selinux-policy-targeted noarch 3.7.19-155.el6_3.14 updates 2.6 M
strace x86_64 4.5.19-1.11.el6_3.2 updates 171 k
sudo x86_64 1.7.4p5-13.el6_3 updates 423 k
tzdata noarch 2012j-1.el6 updates 453 k
util-linux-ng x86_64 2.17.2-12.7.el6_3 updates 1.5 M
Transaction Summary
================================================================================================================================================
Upgrade 38 Package(s)
Cheers,
--
ChrisP
Re: Performance Issues / fork() errors
You might be hitting some ulimits on the system:
http://support.nagios.com/wiki/index.ph ... g_Orphaned
As for the redundant logging, you can turn off logging to syslog by setting:
in the main nagios.cfg file.
http://support.nagios.com/wiki/index.ph ... g_Orphaned
As for the redundant logging, you can turn off logging to syslog by setting:
Code: Select all
use_syslog=0Re: Performance Issues / fork() errors
Thanks,
Redundant logging => OFF!
Right, I also called out to the hosting service provider, just in case grsec was their doing (the CentOS 6.3 build is via their automated provisioning system). They responded thusly: -
The following are available to me: -
Redundant logging => OFF!
Right, I also called out to the hosting service provider, just in case grsec was their doing (the CentOS 6.3 build is via their automated provisioning system). They responded thusly: -
Am I OK to do that & also "yum update" while I'm at it?Grsec is extra security is added to the kernel, see http://en.wikipedia.org/wiki/Grsec
If you do not want this option then you would need to install your own kernel.
The following are available to me: -
Code: Select all
# yum list >yl ; grep -i "^kernel" yl
kernel-headers.x86_64 2.6.32-279.22.1.el6 @updates
kernel.x86_64 2.6.32-279.22.1.el6 updates
kernel-debug.x86_64 2.6.32-279.22.1.el6 updates
kernel-debug-devel.x86_64 2.6.32-279.22.1.el6 updates
kernel-devel.x86_64 2.6.32-279.22.1.el6 updates
kernel-doc.noarch 2.6.32-279.22.1.el6 updates
kernel-firmware.noarch 2.6.32-279.22.1.el6 updates
kerneloops.x86_64 0.11-1.el6.rf rpmforge