Page 1 of 5
Performance Issues / fork() errors
Posted: Sat Jan 26, 2013 5:24 pm
by Gavin
I'm seeing a lot of these errors in 'nagios.log':
Code: Select all
[1359237728] Warning: The check of service 'service' on host 'host' looks like it was orphaned (results never came back). I'm scheduling an immediate check of the service...
[1359237518] Warning: The check of service 'service' on host 'host' could not be performed due to a fork() error: 'Resource temporarily unavailable'. The check will be rescheduled.
The load average on the Nagios server is around 30, despite implementing a combination of ramdisk and rrdcached. I'm beginning to think the 7.2k hard drives in the system are a limitation, and that our performance issues relate to I/O. We're running approx 1500 checks, most of which are checked every minute. I've configured 'dumb' spacing of services, which seemed to help quite a bit.
However, the above error seems to suggest something else is going wrong (it's logged very frequently). Googling suggests this could be caused by the use of embedded perl, but I don't know enough about Linux / Perl / the inner workings of Nagios to understand if this is the case, and what we can do to resolve.
The server is a Quad Core Xeon with 16GB RAM.
Any ideas?
Many thanks,
Gavin
Re: Performance Issues / fork() errors
Posted: Sat Jan 26, 2013 11:30 pm
by scottwilkerson
This error actually sounds like you could have multiple processes of nagios running
http://support.nagios.com/wiki/index.ph ... g_Orphaned
As for the performance, first, if you are not running XI 2012R1.4 I would highly recommend upgrading as there have been significant performance increases.
Aside from that, the best thing you could do at this point would be to offload MySQL to another server which would help the IO problem a lot.
http://assets.nagios.com/downloads/nagi ... Server.pdf
Re: Performance Issues / fork() errors
Posted: Mon Jan 28, 2013 1:24 am
by Gavin
The fork() error seems to have been a red herring, and is no longer occurring. As for performance, we are considering either offload MySQL or using SSD drives instead.
We are running the latest version of XI on a system with the following specifications:
* Intel Xeon E3-1225, Quad 3.2Ghz
* 16GB 1333Mhz DDR3 RAM
* 2x 2TB Hitachi 7200RPM SATA3 Drives
We're using our XI instance as follows:
* 1500 active service checks every minute, most with performance data
* 110 hosts
* 5x users logged into the portal at the same time
We've got RRDCached (with a delay of five minutes) configured, as well as a RAM Disk (as per the published document).
The server rarely gets a load average lower than 20. Does this seem normal? I know its difficult to say for sure, but I'd be interested to hear what you think. We're probably going to end up with just under 4,000 active checks, and a further 4,000 passive checks on this box eventually, so it'd be good to get some idea of the performance we can expect.
Thanks again for your help, excellent as always.
Thanks,
Gavin
Re: Performance Issues / fork() errors
Posted: Mon Jan 28, 2013 8:44 am
by scottwilkerson
Gavin wrote:The server rarely gets a load average lower than 20. Does this seem normal? I know its difficult to say for sure, but I'd be interested to hear what you think.
With what you have said, checking most of these every minute, it is probably normal. The remaining this you could do to reduce the load (which is still likely caused by IO Waite) Would be to offload the MySQL Server.
http://library.nagios.com/library/produ ... ote-server
Going further than that would be a Mod Gearman setup
http://library.nagios.com/library/produ ... -nagios-xi
Re: Performance Issues / fork() errors
Posted: Tue Jan 29, 2013 5:11 pm
by Gavin
Thanks Scott. We're looking into those options.
I had assumed the 'Cannot connect to database' error we've been seeing in CCM was due to load, but every it happens, ndoutils seems to die. If I grep the messages log for ndo, I get...
Code: Select all
Jan 29 21:17:47 Nagios nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
Jan 29 21:17:48 Nagios nagios: ndomod: NDOMOD 1.5.1 (05-15-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Jan 29 21:17:48 Nagios nagios: ndomod: Successfully connected to data sink. 0 queued items to flush.
Jan 29 21:17:48 Nagios nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
Jan 29 21:27:32 Nagios nagios: ndomod: Shutdown complete.
Jan 29 21:27:32 Nagios nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
Jan 29 21:27:33 Nagios nagios: ndomod: NDOMOD 1.5.1 (05-15-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Jan 29 21:27:33 Nagios nagios: ndomod: Successfully connected to data sink. 0 queued items to flush.
Jan 29 21:27:33 Nagios nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
Jan 29 21:30:35 Nagios nagios: ndomod: Shutdown complete.
Jan 29 21:30:35 Nagios nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
Jan 29 21:30:36 Nagios nagios: ndomod: NDOMOD 1.5.1 (05-15-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Jan 29 21:30:36 Nagios nagios: ndomod: Successfully connected to data sink. 0 queued items to flush.
Jan 29 21:30:36 Nagios nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
Jan 29 21:37:59 Nagios nagios: ndomod: Shutdown complete.
Jan 29 21:37:59 Nagios nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
Jan 29 21:38:00 Nagios nagios: ndomod: NDOMOD 1.5.1 (05-15-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Jan 29 21:38:00 Nagios nagios: ndomod: Successfully connected to data sink. 0 queued items to flush.
Jan 29 21:38:00 Nagios nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
Jan 29 21:39:53 Nagios nagios: ndomod: Shutdown complete.
Jan 29 21:39:53 Nagios nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
Jan 29 21:39:54 Nagios nagios: ndomod: NDOMOD 1.5.1 (05-15-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Jan 29 21:39:54 Nagios nagios: ndomod: Successfully connected to data sink. 0 queued items to flush.
Jan 29 21:39:54 Nagios nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
Jan 29 21:55:04 Nagios nagios: ndomod: Shutdown complete.
Jan 29 21:55:04 Nagios nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
Jan 29 21:55:05 Nagios nagios: ndomod: NDOMOD 1.5.1 (05-15-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Jan 29 21:55:05 Nagios nagios: ndomod: Successfully connected to data sink. 0 queued items to flush.
Jan 29 21:55:05 Nagios nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
Jan 29 21:57:39 Nagios nagios: ndomod: Shutdown complete.
Jan 29 21:57:39 Nagios nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
Jan 29 21:57:40 Nagios nagios: ndomod: NDOMOD 1.5.1 (05-15-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Jan 29 21:57:40 Nagios nagios: ndomod: Successfully connected to data sink. 0 queued items to flush.
Jan 29 21:57:40 Nagios nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
Jan 29 22:01:26 Nagios nagios: ndomod: Shutdown complete.
Jan 29 22:01:26 Nagios nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
Jan 29 22:01:28 Nagios nagios: ndomod: NDOMOD 1.5.1 (05-15-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
Jan 29 22:01:28 Nagios nagios: ndomod: Successfully connected to data sink. 0 queued items to flush.
Jan 29 22:01:28 Nagios nagios: Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
Any ideas? This is really annoying when trying to configure a host or service, and it happens really frequently. I've repaired the database to no avail.
Thanks,
Gavin
Re: Performance Issues / fork() errors
Posted: Tue Jan 29, 2013 5:18 pm
by abrist
What version of XI are you running?
Re: Performance Issues / fork() errors
Posted: Tue Jan 29, 2013 5:20 pm
by Gavin
2012R1.4 on CentOS 6.3
Thanks,
Gavin
Re: Performance Issues / fork() errors
Posted: Tue Jan 29, 2013 5:21 pm
by abrist
I assume you have verified that you only have 1 nagios process running?
Re: Performance Issues / fork() errors
Posted: Tue Jan 29, 2013 5:29 pm
by Gavin
I just did a 'killall -9 nagios' and started it again to be sure, and the database problem reoccurred very quickly. We're not seeing any of the fork errors any more, that seems to have been a total one off.
Thanks,
Gavin
Re: Performance Issues / fork() errors
Posted: Tue Jan 29, 2013 5:32 pm
by abrist
Fair enough. I am still digging on this one.