Nagios Core/NSClient Reporting incorrect service state

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
mcannet
Posts: 12
Joined: Sat Jan 01, 2011 2:04 pm

Nagios Core/NSClient Reporting incorrect service state

Post by mcannet »

complete disclosure - 5 days into my Nagios install/config. I'm running Nagios Core 3.2.3 on Fedora. I went through the quick install guide, which was clean/straighforward, no issues there. Nagios is working, I am able to add hosts and such, they appear on the web GUI, etc.

My issue is two things:
1. No reporting - at all! My understanding is there shouldn't really be any configs to do, that most of the commands were preprogrammed. I only changed the email address in the nagios.cfg file for the nagios admin. I have a few critical items showing up - such as disk space low and services stopped, etc. I am not getting any emails at all for these things.

2. Speaking of stopped services... I am a 99% windows shop - I have added a bunch of my windows servers via the windows.cfg file. I installed the NSClient(most recent) on the windows servers. Most of them I am running a service check and they come up green/ok in Nagios. I have 2 servers however, that are reporting everything else from NSClient, except the service I defined - which I checked over and over that I have it typed correctly. The services are running on the windows server, but Nagios is reporting them stopped. Example: I have two SQL servers on both host configs, I am checking for sqlservr.exe - on one server it's reported correctly - as running. The other server is being reported as not running.

here is the snapshot of the config
define service{
use generic-service
host_name admin-sql
service_description SQL Server
check_command check_nt!PROCSTATE!-d SHOWALL -l sqlservr.exe
}
define service{
use generic-service
host_name esm-fmsrv
service_description SQL Server
check_command check_nt!PROCSTATE!-d SHOWALL -l sqlservr.exe
}

The admin-sql server is being reported that sqlservr.exe is not running, the esm-fmsrv is running.

Any help you can provide would be greatly appreciated and please let me know if you need to see more configurations.
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Nagios Core/NSClient Reporting incorrect service state

Post by jsmurphy »

1. No reporting - at all! My understanding is there shouldn't really be any configs to do, that most of the commands were preprogrammed. I only changed the email address in the nagios.cfg file for the nagios admin. I have a few critical items showing up - such as disk space low and services stopped, etc. I am not getting any emails at all for these things.
Make sure you have mail and postfix installed, then ensure that you have your SMTP mail relay configured in /etc/postfix/main.cf (assuming you have an exchange server running as an SMTP relay or something).
2. The services are running on the windows server, but Nagios is reporting them stopped. Example: I have two SQL servers on both host configs, I am checking for sqlservr.exe - on one server it's reported correctly - as running. The other server is being reported as not running.
This one is a little more interesting... I've never seen that behaviour, though I normally monitor the service rather than the process. It might be worthwhile trying to use check_nrpe though (check_nt is slowly becoming deprecated) you can find some examples on that here: http://nsclient.org/nscp/wiki/CheckProcState.
mcannet
Posts: 12
Joined: Sat Jan 01, 2011 2:04 pm

Re: Nagios Core/NSClient Reporting incorrect service state

Post by mcannet »

jsmurphy, thanks for the reply. Do I have to have postfix installed just to send out smtp messages? That functionality isn't already built into nagios?

Also, anyone else have the incorrect reporting with check_nt. I will try to figure out the npre - but I am not clear on that yet. Is it a remote side only tool where the remote machine reports to nagios or does nagios do the check by calling the npre plugin?
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Nagios Core/NSClient Reporting incorrect service state

Post by jsmurphy »

jsmurphy, thanks for the reply. Do I have to have postfix installed just to send out smtp messages? That functionality isn't already built into nagios?
Most linux distros come with postfix(and/or sendmail) and mail(or mailx) installed by default and it's usually as trivial as changing one line in the config to make them work, I suppose the Nagios devs decided why reinvent the wheel? They certainly aren't going to be able to make a mail client and mail relay that's as reliable or fully featured as the existing ones :)
I will try to figure out the npre - but I am not clear on that yet. Is it a remote side only tool where the remote machine reports to nagios or does nagios do the check by calling the npre plugin?
The easiest way to think about it is check_nt and check_nrpe are more or less the same except they are just different ways of communicating to the NSClient++ agent. The key difference is NRPE has the advantage of being cross-platfrom and will work with the agent component of NRPE designed for *nix. You will just need to grab the NRPE package (sometimes I forget it's not installed by default):
http://exchange.nagios.org/directory/Ad ... or/details
Install instructions: http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf

Understanding how Nagios communicates really is the first great hurdle when it comes to learning the system. I did however do a bit of a google on your problem with procstate, the only thing I could find is apparently it doesn't always report properly on 64-bit applications not sure whether that was on 2k3 or 2k8 though. But that was from over 6 months ago so I'm not sure if that was ever formally filed as a bug report and if it was fixed or not.
mcannet
Posts: 12
Joined: Sat Jan 01, 2011 2:04 pm

Re: Nagios Core/NSClient Reporting incorrect service state

Post by mcannet »

ok, thanks. but 1 more question... do I need to configure Nagios to use postfix? If so, where can I find documentation on how to do that?
User avatar
jsmurphy
Posts: 989
Joined: Wed Aug 18, 2010 9:46 pm

Re: Nagios Core/NSClient Reporting incorrect service state

Post by jsmurphy »

Nagios by default will try to use mail; and mail by default will try to use postfix. For a windows based shop, you can think of the "mail" utility as mini-outlook and postfix as mini-exchange, mail is the utility for sending emails and postfix is the utility for routing it. Nagios is email ready out of the box, there is rarely any extra configuration required unless you want to change the mail client you are using.

To configure postfix first you will need to speak to your exchange administrator to find out the requirements for connecting to your company SMTP relay. Assuming a simple configuration (no auth, no SSL, etc), vi /etc/postfix/main.cf
Add the following two lines (putting in the FQDN of your SMTP relay and the hostname of this server):
relayhost: mysmtprelay.my.company
hostname: thishost.my.company

Next restart postfix: /etc/init.d/postfix restart
If your companies requirements are a little more stringent you may need to consult the postfix community for more help on using the more advanced features of postfix.

To test that mail will work with Nagios run the following on the command line and send a test email to yourself:
/usr/bin/printf "%b" "Test email" | /bin/mail -s "Test Email" [email protected]
mcannet
Posts: 12
Joined: Sat Jan 01, 2011 2:04 pm

Re: Nagios Core/NSClient Reporting incorrect service state

Post by mcannet »

I know I'm a total n00b with this, but there is not directory "mail" under bin. I've reinstalled postfix, anything I'm missing?
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: Nagios Core/NSClient Reporting incorrect service state

Post by agriffin »

/bin/mail is not a directory, it's an executable file. It may be located elsewhere; try running 'which mail' and 'which mailx' to find out where it might be on your system.
mcannet
Posts: 12
Joined: Sat Jan 01, 2011 2:04 pm

Re: Nagios Core/NSClient Reporting incorrect service state

Post by mcannet »

[root@esm-nagios ~]# which mail
/usr/bin/which: no mail in (/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)
[root@esm-nagios ~]# which mailx
/usr/bin/which: no mailx in (/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)

now what?
mcannet
Posts: 12
Joined: Sat Jan 01, 2011 2:04 pm

Re: Nagios Core/NSClient Reporting incorrect service state

Post by mcannet »

ok, so I installed mailx (yum). that obivously installed mail/mailx. I was able to start troubleshooting there. Issue is with my exchange server for sure. If I change my email address in the contacts.cfg to my gmail account, emails flow out fine. At least now I know Nagios is working correctly.

Now time to figure out why it cannot send mail to my exchange server.
IP address of Nagios box is already listed in allowable relay servers in exchange. I can ping the DNS name of my exchange server and SMTP domain from nagios box.

Thank you for your help with this so far. I appreciate it.

Now that I know alerts are being emailed, I would like to get back to the NPRE stuff...
Locked