[Nagios-devel] [PATCH 1/5] base/events: Don't get stuck in

Guest · Post by **Guest** » Mon Nov 05, 2012 1:59 pm

From: Robin Sonefors

In a couple of instances, by completely breaking libnagios, I've managed
to create situations where there are no workers connected to nagios.
When that happens, nagios would get stuck in an infinite loop, where
polling for events always returns -1 (EINVAL) immediately, and nagios
would respond to that by immediately trying again, making my machine
warm and tired. Exiting the event loop when that happens seems more
reasonable - we can't run any more checks, and we can't notify anyone
about this, because we have no worker to do so for us.

It would be possible to set sigrestart to TRUE, to force nagios to
restart all its workers again, yet this could create the same crash loop
that I'm trying to fix, only somewhat larger and slower - if all sockets
died, something is seriously broken, so make it possible for an external
watchdog daemon to find out.

Signed-off-by: Robin Sonefors
---
base/events.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/base/events.c b/base/events.c
index 9f059c2..cdaa188 100644
--- a/base/events.c
+++ b/base/events.c
@@ -1003,6 +1003,11 @@ int event_execution_loop(void) {
poll_time_ms, iobroker_get_num_fds(nagios_iobs),
squeue_size(nagios_squeue), nagios_iobs);
inputs = iobroker_poll(nagios_iobs, poll_time_ms);
+ if (inputs < 0) {
+ logit(NSLOG_RUNTIME_ERROR, TRUE, "Error polling for input, giving up");
+ break;
+ }
+
log_debug_info(DEBUGL_IPC, 2, "## %d descriptors had input\n", inputs);

/* 100 milliseconds allowance for firing off events early */
--
1.7.11.7

This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]