As mentioned in my previous post http://support.nagios.com/forum/viewtop ... f=6&t=8509, I had problems with the Nagios agent for Mac OS X which prompted me to look for other avenues. I'm revisiting the agent one more time to see if I can figure it all out. At the end of installation of the agent (with ./fullinstall command) I get an error:
Starting NRPE Agent
launchctl start error: Bad file descriptor
I see org.nagios.nrpe.plist in /Library/LaunchAgents and "launchctl list" confirms its running... so, I don't know what's up there, however Nagios server still reports "Connection refused by host" for all services except ping. I added the IP of the Nagios server in the allowed list during install.
Host server is Mac OS X 10.8.2 Server, Nagios is XI 2012R1.2.
Any ideas?
Thanks!
Problems with Mac OS X Agent
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Problems with Mac OS X Agent
It is possible that this is related to file permissions, can you run over your NRPE installation and make sure that required files have the correct permissions?
This bug has not been reported yet to my knowledge, if the system does in fact say that NRPE is running, have you tried creating some test checks to send from your Nagios server? Also, where did you get the installation .tar? Be sure it was from our assets site.
This bug has not been reported yet to my knowledge, if the system does in fact say that NRPE is running, have you tried creating some test checks to send from your Nagios server? Also, where did you get the installation .tar? Be sure it was from our assets site.
Re: Problems with Mac OS X Agent
Well, when I said it's running, I only meant that launchd is running the .plist file. I don't see any processes that look like nrpe. Looking into it further this morning, I see that the .plist is attempting to load /usr/local/nagios/bin/nrpe, but that file does not exist:
bash-3.2# ls -al /usr/local/nagios/bin/
total 0
drwxr-xr-x 2 nagios nagios 68 Nov 14 12:56 .
drwxr-xr-x 6 nagios nagios 204 Nov 14 12:56 ..
bash-3.2#
So something's breaking during the installation, which is probably where that error about bad file descriptor is coming from and if nrpe is not running, that explains the "Connection refused" errors on the Nagios server too. The question is why is installation breaking and not installing all the parts?
Thanks!
bash-3.2# ls -al /usr/local/nagios/bin/
total 0
drwxr-xr-x 2 nagios nagios 68 Nov 14 12:56 .
drwxr-xr-x 6 nagios nagios 204 Nov 14 12:56 ..
bash-3.2#
So something's breaking during the installation, which is probably where that error about bad file descriptor is coming from and if nrpe is not running, that explains the "Connection refused" errors on the Nagios server too. The question is why is installation breaking and not installing all the parts?
Thanks!
Re: Problems with Mac OS X Agent
Well, I derp'd... I found the logs in /tmp for the last install and noticed weirdness in the subcomponents log. Following up on that, I saw that the command line tools didn't get installed properly from within Xcode. </facepalm>
Ok, so everything now is reading properly except CPU stats. It shows "NRPE: Unable to read output". Running the command locally on the host results in the same error:
bash-3.2# ./check_nrpe -H localhost -t 30 -c check_cpu_stats -a '-w 85 -c 90'
NRPE: Unable to read output
Any ideas there?
Thanks!
Ok, so everything now is reading properly except CPU stats. It shows "NRPE: Unable to read output". Running the command locally on the host results in the same error:
bash-3.2# ./check_nrpe -H localhost -t 30 -c check_cpu_stats -a '-w 85 -c 90'
NRPE: Unable to read output
Any ideas there?
Thanks!
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Problems with Mac OS X Agent
Glad you got the bulk of it working.
there is a problem with that check and a bug report has already been filed
http://tracker.nagios.com/view.php?id=307
there is a problem with that check and a bug report has already been filed
http://tracker.nagios.com/view.php?id=307
Re: Problems with Mac OS X Agent
So things seem to be mostly OK, however my logs are filling up with a lot of errory-looking stuff:
Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Starting up daemon
Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Cannot write to pidfile '/var/run/nrpe.pid' - check your privileges.
Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Warning: Could not set effective GID=11
Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Warning: Unable to change supplementary groups using initgroups()
Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Warning: Could not set effective UID=11
Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Network server bind failure (48: Address already in use)
Dec 7 13:39:26 <servername> com.apple.launchd.peruser.501[192] (org.nagios.nrpe): Throttling respawn: Will start in 10 seconds
I notice that in the LaunchAgent .plist file, the group was "anAppropriateGroup", which I changed to nagios and then unloaded/loaded in launchd. No change.
I've since installed on another Mac and have the same errors in the logs, so it seems to be something happening w/ the agent configuration as well as the error in the .plist file.
Any thoughts here?
Thanks!
Jason
Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Starting up daemon
Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Cannot write to pidfile '/var/run/nrpe.pid' - check your privileges.
Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Warning: Could not set effective GID=11
Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Warning: Unable to change supplementary groups using initgroups()
Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Warning: Could not set effective UID=11
Dec 7 13:39:16 <server.domain.com> nrpe[58414]: Network server bind failure (48: Address already in use)
Dec 7 13:39:26 <servername> com.apple.launchd.peruser.501[192] (org.nagios.nrpe): Throttling respawn: Will start in 10 seconds
I notice that in the LaunchAgent .plist file, the group was "anAppropriateGroup", which I changed to nagios and then unloaded/loaded in launchd. No change.
I've since installed on another Mac and have the same errors in the logs, so it seems to be something happening w/ the agent configuration as well as the error in the .plist file.
Any thoughts here?
Thanks!
Jason
More trouble with Mac OS X Agent
Continuing on with issues...
One of the Mac servers I've just added is reporting a couple of false positives: memory usage through nrpe shows critical, but Activity Monitor on the same machine shows 80% memory free, and one external volume is showing 0 bytes free space when in fact there are a couple of TBs free.
Any help with this? We really want to purchase Nagios XI but it has to be reliable...
One of the Mac servers I've just added is reporting a couple of false positives: memory usage through nrpe shows critical, but Activity Monitor on the same machine shows 80% memory free, and one external volume is showing 0 bytes free space when in fact there are a couple of TBs free.
Any help with this? We really want to purchase Nagios XI but it has to be reliable...
-
slansing
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Problems with Mac OS X Agent
On the note of your check memory plugin, make sure that the logic was not reversed somehow, for instance:
80% free on the system could trigger a critical if you set the threshold to 80% when it needed to be 20%.
As far as the drive monitoring issues, is this a custom plugin or a prepackaged one? Could you let us know what the name of it is?
80% free on the system could trigger a critical if you set the threshold to 80% when it needed to be 20%.
As far as the drive monitoring issues, is this a custom plugin or a prepackaged one? Could you let us know what the name of it is?
Re: Problems with Mac OS X Agent
Hey there,
This is with the bog-standard Nagios agent for Mac OS X I downloaded thru the Monitoring Wizard's link, so I'm using whatever its defaults are.
I see what you're saying on the reverse logic idea, but that's not the case here. The plugin is simply reporting incorrect information period. Here's the current output from 'top':
PhysMem: 2060M wired, 1446M active, 319M inactive, 3824M used, 16G free
and from the nrpe command:
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_mem -a '-w 80 -c 90'
CRITICAL: Memory Usage - 99.58% RAM, 3.90% Swap | phyUsed=99.58%;80;90;0;100 swpUsed=3.90%;;;0;100
From top, you see that 16 GB are free (that's out of 20 GB total) but the nrpe command says nearly 100% is used. It's just reporting the wrong thing altogether.
Thanks for your help!
This is with the bog-standard Nagios agent for Mac OS X I downloaded thru the Monitoring Wizard's link, so I'm using whatever its defaults are.
I see what you're saying on the reverse logic idea, but that's not the case here. The plugin is simply reporting incorrect information period. Here's the current output from 'top':
PhysMem: 2060M wired, 1446M active, 319M inactive, 3824M used, 16G free
and from the nrpe command:
/usr/local/nagios/libexec/check_nrpe -H localhost -c check_mem -a '-w 80 -c 90'
CRITICAL: Memory Usage - 99.58% RAM, 3.90% Swap | phyUsed=99.58%;80;90;0;100 swpUsed=3.90%;;;0;100
From top, you see that 16 GB are free (that's out of 20 GB total) but the nrpe command says nearly 100% is used. It's just reporting the wrong thing altogether.
Thanks for your help!
Re: Problems with Mac OS X Agent
Ok, here's something to throw in the mix... the host that's showing "Unable to read output" for CPU stats is running Mac OS X 10.8.2 Server. I just installed the Mac OS X agent on another Mac OS X 10.8.2 Server machine and I do not have the same problem. Its CPU monitoring is reporting correctly. I've checked that the commands being used are identical. The machines are both Mac minis, the one that shows no errors is a 2009 model, the one that has errors is a 2010 model.
As an aside, if anyone's able to do bug fixes on the Mac OS X agent, a couple things I've found:
-the path that nrpe tries to write the PID file to, /var/run/nrpe.pid, is not writeable by the nagios user. I've had to manually add directory /var/run/nagios, chown to nagios, and add it in the nrpe.cfg file
-the launchd .plist file org.nagios.nrpe.plist contains an error for the group name, "anAppropriateGroup", and is not "nagios" as I'm sure it probably should be.
I'd be happy to work with a developer on tightening up the Mac agent... I make an excellent guinea pig and have lots of different Macs to experiment with! Just throwing that offer out there
As an aside, if anyone's able to do bug fixes on the Mac OS X agent, a couple things I've found:
-the path that nrpe tries to write the PID file to, /var/run/nrpe.pid, is not writeable by the nagios user. I've had to manually add directory /var/run/nagios, chown to nagios, and add it in the nrpe.cfg file
-the launchd .plist file org.nagios.nrpe.plist contains an error for the group name, "anAppropriateGroup", and is not "nagios" as I'm sure it probably should be.
I'd be happy to work with a developer on tightening up the Mac agent... I make an excellent guinea pig and have lots of different Macs to experiment with! Just throwing that offer out there