Page 1 of 1

Nagios fails to update status change on one service only

Posted: Sun Feb 02, 2014 3:29 am
by fatpoe
Hey all. I've been using nagios for a little while for a home lab. I haven't had any issues until recently with a really baffling problem that I absolutely cannot figure out.

For some reason when I try to use this script with a check, Nagios fails to see the change in service. Actually, the script appears to make Nagios go completely sideways for some reason. The script is pretty simple.

Code: Select all

#/bin/bash
#dsmcad_check.sh
serverup=$(ps -ef | grep -m 1 dsmca[d] | awk ' {print $8}' | wc -l)
case $serverup in
        "1") echo "DSMCAD is running"
             exit 0
;;
        "0") echo "CRIT: DSMCAD is not running"
             exit 2
;;
        *) echo "Derp?"
           exit 3
;;
esac
I know the service config and everything is fine. If I change the script to just exit 0 or 2 and force the check, it works. The second I put that code back in, it won't see status changes anymore and start doing wonky things.

Actually just now I put the code above back in, ran the script, it had exit 2 because it isn't running. But when I forced the nagios check, it says it's exit 0 and everything is fine. Okie dokie. I'm missing something here. I used this script (modified for another process) and it's working fine. But even on another machine, the same script above does the same thing. :?

Can anyone point me in the right direction? Am I missing something obvious?

Re: Nagios fails to update status change on one service only

Posted: Mon Feb 03, 2014 12:09 pm
by abrist
Can you run the script from the cli?
I tried on my system and it is working as expected. The only thing I can think of is that the serverup shell command is not always working.

Re: Nagios fails to update status change on one service only

Posted: Mon Feb 03, 2014 2:57 pm
by fatpoe
Your talking just the operating system CLI? If I kill the process or start it up and run the script it returns the proper output. The script appears to be working but nagios doesn't seem to change the status.

I just killed the proc and checked the script. It returned the proper output but nagios doesn't detect it. Even after forcing a re-check of the service, nagios still returns that it's running and everything is A-O-K.

Re: Nagios fails to update status change on one service only

Posted: Mon Feb 03, 2014 3:04 pm
by abrist
sounds like it is always matching "1". Can you echo serverup in your status message?

Code: Select all

#/bin/bash
#dsmcad_check.sh
serverup=$(ps -ef | grep -m 1 dsmca[d] | awk ' {print $8}' | wc -l)
case $serverup in
        "1") echo "$serverup - DSMCAD is running"
             exit 0
;;
        "0") echo "$serverup - CRIT: DSMCAD is not running"
             exit 2
;;
        *) echo "$serverup - Derp?"
           exit 3
;;
esac

Re: Nagios fails to update status change on one service only

Posted: Mon Feb 03, 2014 3:09 pm
by fatpoe
The echo matches was is expected. Either a 0 or a 1 when it's dead or running. That's why it's somewhat baffling. Script works.

Code: Select all

[root@f20_nagios libexec]# ./dsmcad_check.sh
DSMCAD is running
1
[root@f20_nagios libexec]# ps -ef | grep dsm
root     21658     1  0 14:06 ?        00:00:00 dsmcad
root     21728 21339  0 14:08 pts/0    00:00:00 grep --color=auto dsm
[root@f20_nagios libexec]# kill 21658
[root@f20_nagios libexec]# ./dsmcad_check.sh
CRIT: DSMCAD is not running
0
[root@f20_nagios libexec]# /opt/tivoli/tsm/client/ba/bin/start_tsmsched.sh
[root@f20_nagios libexec]# ps -ef | grep dsm
root     21752     1  0 14:09 ?        00:00:00 dsmcad
root     21757 21339  0 14:09 pts/0    00:00:00 grep --color=auto dsm
[root@f20_nagios libexec]# ./dsmcad_check.sh
DSMCAD is running
1
Actually there is something wacky with this script. Run as root it works fine. Run as sudo -u nagios /usr/local/nagios/libexec/dsmcad_check.sh and it returns the wrong results :roll:

Re: Nagios fails to update status change on one service only

Posted: Mon Feb 03, 2014 3:38 pm
by fatpoe
oh dear god it was detecting itself since the process name is included in the script name....

Derp.

Ok I fixed it by changing the script filename.

heh thanks for the help. I knew it was something obvious. Note to self: get more sleep and setup fewer things at 2:30AM.

Re: Nagios fails to update status change on one service only

Posted: Mon Feb 03, 2014 3:42 pm
by tmcdonald
I was going to mention that, then I thought "No, I'm probably reading that wrong"

But good catch though, go get some sleep ;)