Page 1 of 3
Instead of marking error as critical in red, put it as Ok an
Posted: Wed Feb 13, 2019 11:54 am
by fraguillen
Good Morning
I have this problem:
I have a script that returns a value that is related to the status of a service, Ok, warning or critical, the problem is that it indicates any of those states but in the Nagios XI interface it always indicates it to me as OK and in red color but in the status column of the information I
shows as Error.
This is script:
STATE_CRITICAL=2
STATE_OK=0
process="$1"
if [ $(hostname) = "sclc-omc01-prod" -o $(hostname) = "sclc-omc02-prod" ] ; then
CANT=$( ps -fu oracle| grep -v grep | grep -c "$process" 2> /dev/null)
if [ $CANT -gt 0 ]; then
echo "OK: Se encuentra activo el proceso: $process"
Status=$STATE_OK
else
echo "ERROR: Se encuentra caido el proceso, favor revisar para levantar: $process";
Status=$STATE_CRITICAL
fi
else
echo "ERROR: No se encuentra correctamente configurada la alarma, solo corre en el servidor sclc-omc01-prod o sclc-omc02-prod";
Status=$STATE_CRITICAL
fi
echo "=======fin======="
exit $Status
Best regards...
Re: Instead of marking error as critical in red, put it as O
Posted: Wed Feb 13, 2019 12:13 pm
by scottwilkerson
Can you run the script as it would be run from the CLI and then run
Please show the output
Re: Instead of marking error as critical in red, put it as O
Posted: Wed Feb 13, 2019 12:39 pm
by fraguillen
[root@prod libexec]# ./check_val_process_bash LDM_Worker_check.sh
ERROR! Se encuentra caido el proceso, favor revisar para levantar: LDM_Worker_check.sh
=======fin=======
[root@prod libexec]# echo $?
2
[root@prod libexec]#
Re: Instead of marking error as critical in red, put it as O
Posted: Wed Feb 13, 2019 1:08 pm
by scottwilkerson
Is this consistent with what It is displaying in the UI?
Also, do you get the same results if you run this as the nagios user?
Code: Select all
su nagios
./check_val_process_bash LDM_Worker_check.sh
echo $?
Re: Instead of marking error as critical in red, put it as O
Posted: Wed Feb 13, 2019 1:36 pm
by fraguillen
In the UI in the column Status shows OK in green, but in the column Status Information shows:
ERROR! Se encuentra caido el proceso, favor revisar para levantar: LDM_Worker_check.sh
[nagios@prod libexec]$ whoami
nagios
[nagios@prod libexec]$ ./check_val_process_bash LDM_Worker_check.sh
ERROR! Se encuentra caido el proceso, favor revisar para levantar: LDM_Worker_check.sh
=======fin=======
[nagios@prod libexec]$ echo $?
2
[nagios@prod libexec]$
Re: Instead of marking error as critical in red, put it as O
Posted: Wed Feb 13, 2019 1:40 pm
by scottwilkerson
fraguillen wrote:In the UI in the column Status shows OK in green, but in the column Status Information shows:
ERROR! Se encuentra caido el proceso, favor revisar para levantar: LDM_Worker_check.sh
Are you sure you are looking at the service and not the OK for the host?
If you drill into the service details (click service name) does it show CRITICAL?
Re: Instead of marking error as critical in red, put it as O
Posted: Wed Feb 13, 2019 1:51 pm
by fraguillen
How can I upload an image?
Re: Instead of marking error as critical in red, put it as O
Posted: Wed Feb 13, 2019 1:54 pm
by scottwilkerson
Underneath the submit button on the forum is an "Upload attachment" tab, click that
select the file
Click the "Add the file" button
Re: Instead of marking error as critical in red, put it as O
Posted: Wed Feb 13, 2019 2:02 pm
by fraguillen
Attached image of Nagios XI
Re: Instead of marking error as critical in red, put it as O
Posted: Wed Feb 13, 2019 2:16 pm
by scottwilkerson
I just looked at your script again and it is possible you are missing a new line at the end of the script
May I suggest the following modifications
Code: Select all
STATE_CRITICAL=2
STATE_OK=0
process="$1"
if [ $(hostname) = "sclc-omc01-prod" -o $(hostname) = "sclc-omc02-prod" ] ; then
CANT=$( ps -fu oracle| grep -v grep | grep -c "$process" 2> /dev/null)
if [ $CANT -gt 0 ]; then
echo "OK: Se encuentra activo el proceso: $process"
exit $STATE_OK
else
echo "ERROR: Se encuentra caido el proceso, favor revisar para levantar: $process";
exit $STATE_CRITICAL
fi
else
echo "ERROR: No se encuentra correctamente configurada la alarma, solo corre en el servidor sclc-omc01-prod o sclc-omc02-prod";
exit $STATE_CRITICAL
fi