Page 1 of 2

Problems with Mac OS X Agent

Posted: Tue Dec 04, 2012 1:11 pm
by jwestlake
As mentioned in my previous post http://support.nagios.com/forum/viewtop ... f=6&t=8509, I had problems with the Nagios agent for Mac OS X which prompted me to look for other avenues. I'm revisiting the agent one more time to see if I can figure it all out. At the end of installation of the agent (with ./fullinstall command) I get an error:

Starting NRPE Agent
launchctl start error: Bad file descriptor

I see org.nagios.nrpe.plist in /Library/LaunchAgents and "launchctl list" confirms its running... so, I don't know what's up there, however Nagios server still reports "Connection refused by host" for all services except ping. I added the IP of the Nagios server in the allowed list during install.

Host server is Mac OS X 10.8.2 Server, Nagios is XI 2012R1.2.

Any ideas?

Thanks!

Re: Problems with Mac OS X Agent

Posted: Tue Dec 04, 2012 2:21 pm
by slansing
It is possible that this is related to file permissions, can you run over your NRPE installation and make sure that required files have the correct permissions?

This bug has not been reported yet to my knowledge, if the system does in fact say that NRPE is running, have you tried creating some test checks to send from your Nagios server? Also, where did you get the installation .tar? Be sure it was from our assets site.

Re: Problems with Mac OS X Agent

Posted: Wed Dec 05, 2012 8:42 am
by jwestlake
Well, when I said it's running, I only meant that launchd is running the .plist file. I don't see any processes that look like nrpe. Looking into it further this morning, I see that the .plist is attempting to load /usr/local/nagios/bin/nrpe, but that file does not exist:

bash-3.2# ls -al /usr/local/nagios/bin/
total 0
drwxr-xr-x 2 nagios nagios 68 Nov 14 12:56 .
drwxr-xr-x 6 nagios nagios 204 Nov 14 12:56 ..
bash-3.2#

So something's breaking during the installation, which is probably where that error about bad file descriptor is coming from and if nrpe is not running, that explains the "Connection refused" errors on the Nagios server too. The question is why is installation breaking and not installing all the parts?

Thanks!

Re: Problems with Mac OS X Agent

Posted: Wed Dec 05, 2012 9:21 am
by jwestlake
Well, I derp'd... I found the logs in /tmp for the last install and noticed weirdness in the subcomponents log. Following up on that, I saw that the command line tools didn't get installed properly from within Xcode. </facepalm>

Ok, so everything now is reading properly except CPU stats. It shows "NRPE: Unable to read output". Running the command locally on the host results in the same error:

bash-3.2# ./check_nrpe -H localhost -t 30 -c check_cpu_stats -a '-w 85 -c 90'
NRPE: Unable to read output

Any ideas there?

Thanks!

Re: Problems with Mac OS X Agent

Posted: Wed Dec 05, 2012 10:34 am
by scottwilkerson
Glad you got the bulk of it working.

there is a problem with that check and a bug report has already been filed

http://tracker.nagios.com/view.php?id=307

Re: Problems with Mac OS X Agent

Posted: Fri Dec 07, 2012 1:44 pm
by jwestlake
So things seem to be mostly OK, however my logs are filling up with a lot of errory-looking stuff:

Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Starting up daemon
Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Cannot write to pidfile '/var/run/nrpe.pid' - check your privileges.
Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Warning: Could not set effective GID=11
Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Warning: Unable to change supplementary groups using initgroups()
Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Warning: Could not set effective UID=11
Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Network server bind failure (48: Address already in use)
Dec 7 13:39:26 <servername> com.apple.launchd.peruser.501[192] (org.nagios.nrpe): Throttling respawn: Will start in 10 seconds

I notice that in the LaunchAgent .plist file, the group was "anAppropriateGroup", which I changed to nagios and then unloaded/loaded in launchd. No change.

I've since installed on another Mac and have the same errors in the logs, so it seems to be something happening w/ the agent configuration as well as the error in the .plist file.

Any thoughts here?

Thanks!

Jason

More trouble with Mac OS X Agent

Posted: Mon Dec 10, 2012 8:29 am
by jwestlake
Continuing on with issues...

One of the Mac servers I've just added is reporting a couple of false positives: memory usage through nrpe shows critical, but Activity Monitor on the same machine shows 80% memory free, and one external volume is showing 0 bytes free space when in fact there are a couple of TBs free.

Any help with this? We really want to purchase Nagios XI but it has to be reliable...

Re: Problems with Mac OS X Agent

Posted: Mon Dec 10, 2012 12:52 pm
by slansing
On the note of your check memory plugin, make sure that the logic was not reversed somehow, for instance:

80% free on the system could trigger a critical if you set the threshold to 80% when it needed to be 20%.

As far as the drive monitoring issues, is this a custom plugin or a prepackaged one? Could you let us know what the name of it is?

Re: Problems with Mac OS X Agent

Posted: Mon Dec 10, 2012 1:08 pm
by jwestlake
Hey there,

This is with the bog-standard Nagios agent for Mac OS X I downloaded thru the Monitoring Wizard's link, so I'm using whatever its defaults are.

I see what you're saying on the reverse logic idea, but that's not the case here. The plugin is simply reporting incorrect information period. Here's the current output from 'top':

PhysMem: 2060M wired, 1446M active, 319M inactive, 3824M used, 16G free

and from the nrpe command:

/usr/local/nagios/libexec/check_nrpe -H localhost -c check_mem -a '-w 80 -c 90'
CRITICAL: Memory Usage - 99.58% RAM, 3.90% Swap | phyUsed=99.58%;80;90;0;100 swpUsed=3.90%;;;0;100

From top, you see that 16 GB are free (that's out of 20 GB total) but the nrpe command says nearly 100% is used. It's just reporting the wrong thing altogether.

Thanks for your help!

Re: Problems with Mac OS X Agent

Posted: Mon Dec 10, 2012 3:18 pm
by jwestlake
Ok, here's something to throw in the mix... the host that's showing "Unable to read output" for CPU stats is running Mac OS X 10.8.2 Server. I just installed the Mac OS X agent on another Mac OS X 10.8.2 Server machine and I do not have the same problem. Its CPU monitoring is reporting correctly. I've checked that the commands being used are identical. The machines are both Mac minis, the one that shows no errors is a 2009 model, the one that has errors is a 2010 model.

As an aside, if anyone's able to do bug fixes on the Mac OS X agent, a couple things I've found:

-the path that nrpe tries to write the PID file to, /var/run/nrpe.pid, is not writeable by the nagios user. I've had to manually add directory /var/run/nagios, chown to nagios, and add it in the nrpe.cfg file
-the launchd .plist file org.nagios.nrpe.plist contains an error for the group name, "anAppropriateGroup", and is not "nagios" as I'm sure it probably should be.

I'd be happy to work with a developer on tightening up the Mac agent... I make an excellent guinea pig and have lots of different Macs to experiment with! Just throwing that offer out there :-)