Nagios 4 Load issues - OS X 10.9

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
sreinhardt
-fno-stack-protector
Posts: 4366
Joined: Mon Nov 19, 2012 12:10 pm

Re: Nagios 4 Load issues - OS X 10.9

Post by sreinhardt »

Actually he just got back to me that this is resolved.

https://github.com/NagiosEnterprises/na ... ca3b89a8c6

One thing to note, is that core gets very angry about time changes, as it severely messes with the internal scheduler, especially for tasks that have already been scheduled but not run. There really isn't a good way to resolve this without a large rewrite, and is not really considered a use case of core. A better alternative is to use an agent on those systems to push passive checks back to your main instance. This would resolve all sleep issues.
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
jpotter
Posts: 9
Joined: Wed May 28, 2014 7:20 am

Re: Nagios 4 Load issues - OS X 10.9

Post by jpotter »

Thanks!

The issue we're seeing is on single nodes -- Nagios running on OS X (on a dev laptop) checking services only on the same node -- e.g. all localhost.
emislivec
Posts: 52
Joined: Tue Feb 25, 2014 10:06 am

Re: Nagios 4 Load issues - OS X 10.9

Post by emislivec »

jpotter wrote:Thanks! I'm happy to send in a bug report / patch, just didn't see where to do it. :)
You can open issues or send pull requests to us on GitHub: https://github.com/NagiosEnterprises/nagioscore
We also have our Mantis bug-tracker at http://tracker.nagios.org
Hate mail can go to emislivec@nagios.com

The forums here are a good place to start since we do have multiple projects/products, not all issues are strictly problems with the code, and some issues may have already been resolved in a newer version.

Let us know if you have other pain points on OS X and we can start working to get those resolved.
emislivec
Posts: 52
Joined: Tue Feb 25, 2014 10:06 am

Re: Nagios 4 Load issues - OS X 10.9

Post by emislivec »

jpotter wrote:The issue we're seeing is on single nodes -- Nagios running on OS X (on a dev laptop) checking services only on the same node -- e.g. all localhost.
Adding a hook to stop Nagios on sleep and start it on wake could be a good solution. It's not running checks while sleeping anyway, and a clean start/stop would prevent time related issues like this, or others like file ages going off. I'm not sure how to implement this on modern OS X though.
jpotter
Posts: 9
Joined: Wed May 28, 2014 7:20 am

Re: Nagios 4 Load issues - OS X 10.9

Post by jpotter »

Hmm. OS X API stuff is above my pay grade, but presumably there must be an API for registering for a call-back for system suspension/sleep. Seems like overkill.

I don't think the underlying bug is caused by the system having been in hibernation, just that resuming from sleep happens to trigger a condition. Just a wild-ass-guess, but this seems like a parent process/thread spinning on reading from something that's closed / EOF, e.g. where the reader is trying to read until it gets something but failing to do an EOF check on the file handle.

-J
emislivec
Posts: 52
Joined: Tue Feb 25, 2014 10:06 am

Re: Nagios 4 Load issues - OS X 10.9

Post by emislivec »

jpotter wrote:Hmm. OS X API stuff is above my pay grade, but presumably there must be an API for registering for a call-back for system suspension/sleep. Seems like overkill.
It looks like there are some tools out there, SleepWatcher seems to be a popular recommendation. The general idea isn't that different than startup/shutdown, just need a wayo to get the init scripts to run on sleep/wake.
jpotter wrote:I don't think the underlying bug is caused by the system having been in hibernation, just that resuming from sleep happens to trigger a condition. Just a wild-ass-guess, but this seems like a parent process/thread spinning on reading from something that's closed / EOF, e.g. where the reader is trying to read until it gets something but failing to do an EOF check on the file handle.
There are a few issues from the time change when waking (timeouts on running jobs, timers, waiting for I/O), and the

Code: Select all

read(0x12, "\0", 0x1000) = -1 Err#35
are related.

How long are you putting the machines to sleep? Do you see this behavior consistently when waking?
jpotter
Posts: 9
Joined: Wed May 28, 2014 7:20 am

Re: Nagios 4 Load issues - OS X 10.9

Post by jpotter »

Thanks for the suggestion of SleepWatcher -- hadn't heard of it before; looks like a nifty tool for side-stepping the problem.

Re: sleep duration, our developers are typically seeing it when they close the laptop and go home or come back into the office -- so at least a ~20 minute sleep period, but presumably sometimes overnight. I've been able to reproduce it, but not 100% of the time. I'll see if I can get more data and report back.
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: Nagios 4 Load issues - OS X 10.9

Post by lmiltchev »

I've been able to reproduce it, but not 100% of the time. I'll see if I can get more data and report back.
Sure. Let us know when you have more info.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked