Page 1 of 2

Service check results different in GUI vs. from command line

Posted: Mon Aug 10, 2015 10:13 am
by rickwilson7425
I am working on consolidating a couple of Nagios 3.2 servers onto one 3.5 server. I have some DNS checks that use the check_cluster plug-in.

I get a good return of information, formatted properly, when I run the check from the command line as either root or nagios user.

When the check is run from within Nagios I get a critical error saying pretty much the opposite of what the command line says. This happens for a number of checks based off the check_cluster plug-in.

The checks work fine in the existing 3.2 servers.

Rick

Re: Service check results different in GUI vs. from command

Posted: Mon Aug 10, 2015 1:40 pm
by ssax
Please post the exact command (sanitized of course) that you're running from the command line and the exact command and service definition from the non-working one so that we can try to spot any differences.

Re: Service check results different in GUI vs. from command

Posted: Mon Aug 10, 2015 1:50 pm
by rickwilson7425
Here is the working command line:

Code: Select all

perl check_agregate -v mque -t "unsent=(#d+).#d+" -d $SERVICEPERFDATA:relay1.dcpn:mailq$,$SERVICEPERFDATA:relay2.dcpn:mailq$,$SERVICEPERFDATA:relay3.dcpn:mailq$,$SERVICEPERFDATA:relay4.dcpn:mailq$,$SERVICEPERFDATA:relay1.wcpn:mailq$,$SERVICEPERFDATA:relay2.wcpn:mailq$

mque OK - =:relay1.dcpn:mailq$= =:relay2.dcpn:mailq$= =:relay3.dcpn:mailq$= =:relay4.dcpn:mailq$= =:relay1.wcpn:mailq$= =:relay2.wcpn:mailq$= 
The command and service definitions are in the attachment, along with a clip of the statuses:

Re: Service check results different in GUI vs. from command

Posted: Mon Aug 10, 2015 1:55 pm
by tmcdonald
Can you please post the command and service definition as requested by ssax? We don't know what sort of results you can get from using a third-party web fronted (is that just NagiosQL?) to build the configs, so it is best to see the final result directly. Chances are some of the characters in the arguments are causing problems.

Re: Service check results different in GUI vs. from command

Posted: Mon Aug 10, 2015 2:06 pm
by rickwilson7425
That is the Thruk interface for OMD -

Here is the service definition:

define service {
service_description DNS
host_name nsal1.dc,nsal2.dc,nsal1.wc,nsal2.wc
use gen-service
check_command check_dns!www.genesyslab.com!198.49.180.8
servicegroups DNS_ALU
}

Here is the command:

define command {
command_name check_service_cluster
command_line $USER1$/check_cluster -s -l $ARG1$ -w $ARG2$ -c $ARG3$ -d $ARG4$
}

Re: Service check results different in GUI vs. from command

Posted: Mon Aug 10, 2015 4:30 pm
by tmcdonald
That doesn't look like the correct service file. That's using check_dns and in your example you were using check_aggregate. Can you post that definition?

Re: Service check results different in GUI vs. from command

Posted: Mon Aug 10, 2015 4:49 pm
by rickwilson7425
I'm sorry - going blind looking at all this stuff today. The agregate thing is one I fixed already - I got them confused.

**********************************************************************
The problem is with a DNS check, here are the service and command defs:

define service {
service_description DNS_ALU
host_name CLUSTER.HOLDER
use gen-service
check_command check_service_cluster!"DNS Cluster"!3!3!$SERVICESTATEID:nsal1.dc:DNS$,$SERVICESTATEID:nsal2.dc:DNS$,$SERVICESTATEID:nsal1.wc:DNS$,$SERVICESTATEID:nsal2.wc:DNS$
}

define command {
command_name check_service_cluster
command_line $USER1$/check_cluster -s -l $ARG1$ -w $ARG2$ -c $ARG3$ -d $ARG4$
}

*************************************************************

This is the result of running from command line:

./check_cluster -s -l "DNS Cluster" -w 3 -c 3 -d $SERVICESTATEID:nsal1.dc:DNS$,$SERVICESTATEID:nsal2.dc:DNS$,$SERVICESTATEID:nsal1.wc:DNS$,$SERVICESTATEID:nsal2.wc:DNS$

CLUSTER OK: DNS Cluster: 4 ok, 0 warning, 0 unknown, 0 critical

*************************************************************

This is what is showing in the Nagios GUI:

CLUSTER.HOLDER DNS_ALU CRITICAL 14:43:48 3d 4h 27m 20s 2/2 CLUSTER CRITICAL: DNS Cluster: 0 ok, 0 warning, 0 unknown, 4 critical

*************************************************************

The command line is showing 4 OK - the GUI is showing 4 CRITICAL

Re: Service check results different in GUI vs. from command

Posted: Tue Aug 11, 2015 12:08 am
by Box293
rickwilson7425 wrote:This is the result of running from command line:

./check_cluster -s -l "DNS Cluster" -w 3 -c 3 -d $SERVICESTATEID:nsal1.dc:DNS$,$SERVICESTATEID:nsal2.dc:DNS$,$SERVICESTATEID:nsal1.wc:DNS$,$SERVICESTATEID:nsal2.wc:DNS$

CLUSTER OK: DNS Cluster: 4 ok, 0 warning, 0 unknown, 0 critical
I've not played with on demand macros, however from what I understand, you can't run a plugin at the command line that references macros as they will not be expanded to their true values. I suspect all of these are evaluating to 0 and hence why it runs OK from the command line:

Code: Select all

./check_cluster -s -l "DNS Cluster" -w 3 -c 3 -d 0,0,0,0

CLUSTER OK: DNS Cluster: 4 ok, 0 warning, 0 unknown, 0 critical
Can you confirm that each host nsal1.dc, nsal2.dc, nsal1.wc, nsal2.wc has a service called DNS. Do all of these services currently have an OK (0) state?

Re: Service check results different in GUI vs. from command

Posted: Tue Aug 11, 2015 8:47 am
by rickwilson7425
Yes, the services are running fine. Here are the results from the old server (the same as from the command line on the new server):

CLUSTER.HOLDER DNS_ALU OK 2015-08-11 09:39:18 1736d 16h 19m 40s 1/2 CLUSTER OK: DNS Cluster: 4 ok, 0 warning, 0 unknown, 0 critical

Re: Service check results different in GUI vs. from command

Posted: Tue Aug 11, 2015 5:22 pm
by tmcdonald
Can you run the check from the CLI as the nagios user on the 3.2 server and post the results? There is no way the on-demand macros can be working when run from the CLI manually.