nrds logging and error forwarding

kendallchenoweth · Post by **kendallchenoweth** » Thu Dec 05, 2013 4:00 pm

I'm using Nagios XI 2012R2.5 on a Centos VM provided by Nagios and am using ncpa and nrdp/nrds to perform passive client checks. When testing out some possible error conditions, I noticed that if I delibertly set the token incorrectly or use a nonexistent check script in the command, the nrds.pl executed by cron will capture the error in it's argument to send_nrds.sh, but that error never propagates back to the nagios server. I see that there is a log_file attribute in the nrds.cfg file, but the file doesn't get created (and if I create it manually), nothing is logged.

I'm trying to put something in place to alert if nrdp/nrds isn't working properly, so I figured i could check a log file for a string. Is there a better way to do this? Is nrdp/s supposed to be sending it's error state to the nagios server?

Thanks in advance!

-Kendall Chenoweth

slansing · Post by **slansing** » Thu Dec 05, 2013 4:28 pm

What version of the agent did you install on that system? And what version is installed in nagios xi? Admin > Manage Components > NRDS Config Manager

kendallchenoweth · Post by **kendallchenoweth** » Fri Dec 06, 2013 9:10 am

I'm using the stock NRDS installation provided with Nagios XI 2012R2.5. I don't see an NRDS version listed in the Nagios XI website. send_nrdp.sh is at revision 0.3, nrds.pl and nrds_updater.pl are at revision 0.1.

kendallchenoweth · Post by **kendallchenoweth** » Fri Dec 06, 2013 11:53 am

I found the problem and nrds error logging works most of the time. I had a mismatch on the service name in the nrds.cfg and the service definition on the nagios XI server. I apologize for making a stupid mistake. I found the error by looking at the /usr/local/nagios/var/nagios.log.

An error in execution is flagged if
1) the token is wrong
2) the port argument is wrong
3) a custom check plugin argument is missing (presumably variants on this)

Interestingly enough, if I provide an invalid check plugin argument, I do not get an error message, e.g.

VALID
command[ncpa-random]=/usr/local/ncpa/plugins/check_ncpa.py -H `hostname -A` -t mytoken -P 5693 -M agent/plugin/random.pl/-w\ 5\ -c\ 7\ -a\ lunchtime
INVALID
command[ncpa-random]=/usr/local/ncpa/plugins/check_ncpa.py -H `hostname -A` -t mytoken -P 5693 -M agent/plugin/script_does_not_exist_or_cant_run.pl/-w\ 5\ -c\ 7\ -a\ lunchtime

Unfortunately, this error is also NOT logged in the client's local /usr/local/ncpa/var/ncpa_listener.log. The last good valid remains unrefreshed in the nagios server host details.

If I setup an active/freshness on the server check, I think I can restore monitoring.

Is this a bug? And, if so, what is the workaround to know that's something is wrong?

Thanks!
-Kendall Chenoweth

slansing · Post by **slansing** » Fri Dec 06, 2013 2:33 pm

It's not a stupid mistake if you learned more about the subject content and the issue was resolved.

Have you tried checking the client's ncpa_passive.log?

kendallchenoweth · Post by **kendallchenoweth** » Fri Dec 06, 2013 3:40 pm

I didn't see anything in the ncpa_passive.log.

I think there is a feature/bug in the way that the nrds.pl and crontab entry work for ncpa passive client checks.

If there is a problem with the token identity, port or script argument, everything works like it should returning an error to the nagios server. However if the plugin script provided to ncpa_check won't execute (doesn't exist or something else) that causes the output to return a single newline ... with two side effects.

1) The check fails in such a way to allow an active takeover of the service check by the nagios server. This is a good thing.
2) The check remains in a failed state with no error or update. If there is no active takeover, monitoring on this service check is lost without notification. Adding the active takover helps, but there is still no notification of a problem, so the client issue will never get fixed.

Is this the expected behavior? Am I doing something wrong?

I've tested some minor modifications which could address this issue by

1) adding a check at the beginning of nrds.pl to see if the ncpa_posix_listener process is running. If not, don't do anything further. This assumes that ncpa is in use and is not a general solution, but the idea is to somehow inhibit the nrds.pl from running.
2) Check for an empty output result and generate a output string to pass back to the nagios server, thereby providing enough data to update the nagios server.

If there is a problem with the client script while the listener is running, you could get a flapping state between errors and a valid state. (This is most likely if the service check is out of sync with it's command line from the nrds config file.) Still, you will get an error.

I also added a verbose option to display what nrds.pl is doing to help understand what's happening.

Here's the diff on the code between version 0.1 and what I added...

Code: Select all

root@kchenowe-ubuntu:/usr/local/nrdp/clients/nrds# diff nrds.pl nrds.pl.original 
14c14
< my $RELEASE = "Revision 0.1a"; # this script modified by Kendall Chenoweth
---
> my $RELEASE = "Revision 0.1";
52,59d51
< $cannot_execute_script = 0;
< $listener_running = `ps -eaf | grep ncpa_posix_listener | grep -v grep | wc -l`;
< if ($listener_running < 1)
< {
< 	print ("NCPA listening process is not active\n");
< 	exit 1;
< }
< 
68d59
< 							"verbose|v"		=> \$verbose,
101d91
< 	chop ($output);
108d97
< 
113,117d101
< 		if ($output eq "") 
< 		{
< 			$output = "Unknown error occured processing plugin";
< 			$cannot_execute_script = 1;
< 		}
121,124d104
< 		if ($cannot_execute_script)
< 		{
< 			system ("echo \"ERROR: $output ($status) - $cmd\" >> /usr/local/nrdp/clients/nrds/nrds.log");
< 		}
126,129d105
< if ($verbose)
< {
< 	print ("Sending output from $cmd\n\toutput=$output\n\tstatus=$status\n\tsenddata=$senddata\n");
< }

slansing · Post by **slansing** » Mon Dec 09, 2013 1:41 pm

I'm looking into this along with the creator of NCPA, so we should be able to give some input pretty soon here. Thank you for the reports!

Nagios Support Forum

nrds logging and error forwarding

nrds logging and error forwarding

Re: nrds logging and error forwarding

Re: nrds logging and error forwarding

Re: nrds logging and error forwarding

Re: nrds logging and error forwarding

Re: nrds logging and error forwarding

Re: nrds logging and error forwarding