[SOLVED] Service has OK status when it shouldn't

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
User avatar
sebastiaopburnay
Posts: 105
Joined: Sun Oct 31, 2010 1:40 pm
Location: Lisbon, Portugal

[SOLVED] Service has OK status when it shouldn't

Post by sebastiaopburnay »

Hi!

This is a strange issue, maybe related to the way the nagios process is dealing with the server's pipes and check_command invoking.

A check_nrpe based probe is being given the OK status by nagios while it should be CRITICAL.

The output is as it should, but the status is wrongly OK (check attached image).

I also add the service and command definitions as well as the output from the CLI along with return code echoed to screen

Code: Select all

## service.cfg
define service{
        use                             my-generic-service,srv-pnp
        host_name                       ODIN
        service_description             Drive_C
        servicegroups                   ERP,todos,E-Mail,Domain
        check_command                   check_nrpe!CheckDriveSize -a ShowAll  MaxWarnUsed=90% MaxCritUsed=95% Drive='C'
}
## command.cfg
define command{
        command_name    check_nrpe
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
        }
## CLI output and return code
$> ./check_nrpe -H odin -c CheckDriveSize -a ShowAll  MaxWarnUsed=90% MaxCritUsed=95% Drive='C'
CRITICAL: C:: Total: 19.5G - Used: 18.7G (95%) - Free: 860M (5%) > critical|'C: %'=95%;90;95 'C:'=18.69GB;17.58;18.56;0;19.53
$> echo $?
2
It is not the first time I've noticed similar issues with nagios, but I never seen it persisting along so much time/check-attempts.

The server is a virtualized Ubuntu Server 12.04 LTS, the nagios is v3.4.1 and the check_nrpe plugin is v2.12.

Thank you for your time and effort.

Best regards,
sebastiaopburnay.
Attachments
MissClassified.PNG
Last edited by sebastiaopburnay on Fri Jul 05, 2013 2:46 pm, edited 2 times in total.
User avatar
sebastiaopburnay
Posts: 105
Joined: Sun Oct 31, 2010 1:40 pm
Location: Lisbon, Portugal

Re: Service has OK status when it shouldn't

Post by sebastiaopburnay »

It took some time, but the problem was actually verry simple.

I use a distributed architecture in which remote Nagios' servers actively monitor infrastrucrures and send (via NSCA) the results, so they are stored in a central Nagios' server.

This problem was occurring in a remote server for servicechecks inheriting configs from a template.

Well, that template had active and passive checks both enabled (as you can see by the attached image).

Removing the passive checks from the template was enough to solve the issue.

Thank you and sorry for the stupid post.
Locked