Page 1 of 1

[SOLVED] Service has OK status when it shouldn't

Posted: Thu Jul 04, 2013 9:57 am
by sebastiaopburnay
Hi!

This is a strange issue, maybe related to the way the nagios process is dealing with the server's pipes and check_command invoking.

A check_nrpe based probe is being given the OK status by nagios while it should be CRITICAL.

The output is as it should, but the status is wrongly OK (check attached image).

I also add the service and command definitions as well as the output from the CLI along with return code echoed to screen

Code: Select all

## service.cfg
define service{
        use                             my-generic-service,srv-pnp
        host_name                       ODIN
        service_description             Drive_C
        servicegroups                   ERP,todos,E-Mail,Domain
        check_command                   check_nrpe!CheckDriveSize -a ShowAll  MaxWarnUsed=90% MaxCritUsed=95% Drive='C'
}
## command.cfg
define command{
        command_name    check_nrpe
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
        }
## CLI output and return code
$> ./check_nrpe -H odin -c CheckDriveSize -a ShowAll  MaxWarnUsed=90% MaxCritUsed=95% Drive='C'
CRITICAL: C:: Total: 19.5G - Used: 18.7G (95%) - Free: 860M (5%) > critical|'C: %'=95%;90;95 'C:'=18.69GB;17.58;18.56;0;19.53
$> echo $?
2
It is not the first time I've noticed similar issues with nagios, but I never seen it persisting along so much time/check-attempts.

The server is a virtualized Ubuntu Server 12.04 LTS, the nagios is v3.4.1 and the check_nrpe plugin is v2.12.

Thank you for your time and effort.

Best regards,
sebastiaopburnay.

Re: Service has OK status when it shouldn't

Posted: Fri Jul 05, 2013 2:44 pm
by sebastiaopburnay
It took some time, but the problem was actually verry simple.

I use a distributed architecture in which remote Nagios' servers actively monitor infrastrucrures and send (via NSCA) the results, so they are stored in a central Nagios' server.

This problem was occurring in a remote server for servicechecks inheriting configs from a template.

Well, that template had active and passive checks both enabled (as you can see by the attached image).

Removing the passive checks from the template was enough to solve the issue.

Thank you and sorry for the stupid post.