Nagios fails to update status change on one service only

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
fatpoe
Posts: 4
Joined: Sun Feb 02, 2014 3:06 am

Nagios fails to update status change on one service only

Post by fatpoe »

Hey all. I've been using nagios for a little while for a home lab. I haven't had any issues until recently with a really baffling problem that I absolutely cannot figure out.

For some reason when I try to use this script with a check, Nagios fails to see the change in service. Actually, the script appears to make Nagios go completely sideways for some reason. The script is pretty simple.

Code: Select all

#/bin/bash
#dsmcad_check.sh
serverup=$(ps -ef | grep -m 1 dsmca[d] | awk ' {print $8}' | wc -l)
case $serverup in
        "1") echo "DSMCAD is running"
             exit 0
;;
        "0") echo "CRIT: DSMCAD is not running"
             exit 2
;;
        *) echo "Derp?"
           exit 3
;;
esac
I know the service config and everything is fine. If I change the script to just exit 0 or 2 and force the check, it works. The second I put that code back in, it won't see status changes anymore and start doing wonky things.

Actually just now I put the code above back in, ran the script, it had exit 2 because it isn't running. But when I forced the nagios check, it says it's exit 0 and everything is fine. Okie dokie. I'm missing something here. I used this script (modified for another process) and it's working fine. But even on another machine, the same script above does the same thing. :?

Can anyone point me in the right direction? Am I missing something obvious?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios fails to update status change on one service only

Post by abrist »

Can you run the script from the cli?
I tried on my system and it is working as expected. The only thing I can think of is that the serverup shell command is not always working.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
fatpoe
Posts: 4
Joined: Sun Feb 02, 2014 3:06 am

Re: Nagios fails to update status change on one service only

Post by fatpoe »

Your talking just the operating system CLI? If I kill the process or start it up and run the script it returns the proper output. The script appears to be working but nagios doesn't seem to change the status.

I just killed the proc and checked the script. It returned the proper output but nagios doesn't detect it. Even after forcing a re-check of the service, nagios still returns that it's running and everything is A-O-K.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: Nagios fails to update status change on one service only

Post by abrist »

sounds like it is always matching "1". Can you echo serverup in your status message?

Code: Select all

#/bin/bash
#dsmcad_check.sh
serverup=$(ps -ef | grep -m 1 dsmca[d] | awk ' {print $8}' | wc -l)
case $serverup in
        "1") echo "$serverup - DSMCAD is running"
             exit 0
;;
        "0") echo "$serverup - CRIT: DSMCAD is not running"
             exit 2
;;
        *) echo "$serverup - Derp?"
           exit 3
;;
esac
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
fatpoe
Posts: 4
Joined: Sun Feb 02, 2014 3:06 am

Re: Nagios fails to update status change on one service only

Post by fatpoe »

The echo matches was is expected. Either a 0 or a 1 when it's dead or running. That's why it's somewhat baffling. Script works.

Code: Select all

[root@f20_nagios libexec]# ./dsmcad_check.sh
DSMCAD is running
1
[root@f20_nagios libexec]# ps -ef | grep dsm
root     21658     1  0 14:06 ?        00:00:00 dsmcad
root     21728 21339  0 14:08 pts/0    00:00:00 grep --color=auto dsm
[root@f20_nagios libexec]# kill 21658
[root@f20_nagios libexec]# ./dsmcad_check.sh
CRIT: DSMCAD is not running
0
[root@f20_nagios libexec]# /opt/tivoli/tsm/client/ba/bin/start_tsmsched.sh
[root@f20_nagios libexec]# ps -ef | grep dsm
root     21752     1  0 14:09 ?        00:00:00 dsmcad
root     21757 21339  0 14:09 pts/0    00:00:00 grep --color=auto dsm
[root@f20_nagios libexec]# ./dsmcad_check.sh
DSMCAD is running
1
Actually there is something wacky with this script. Run as root it works fine. Run as sudo -u nagios /usr/local/nagios/libexec/dsmcad_check.sh and it returns the wrong results :roll:
fatpoe
Posts: 4
Joined: Sun Feb 02, 2014 3:06 am

Re: Nagios fails to update status change on one service only

Post by fatpoe »

oh dear god it was detecting itself since the process name is included in the script name....

Derp.

Ok I fixed it by changing the script filename.

heh thanks for the help. I knew it was something obvious. Note to self: get more sleep and setup fewer things at 2:30AM.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Nagios fails to update status change on one service only

Post by tmcdonald »

I was going to mention that, then I thought "No, I'm probably reading that wrong"

But good catch though, go get some sleep ;)
Former Nagios employee
Locked