Page 1 of 1

nagios core check_oracle_health issue

Posted: Tue Dec 18, 2018 5:00 am
by hprdcnagios
Dear Support Nagios.
we have Nagios Core version 4.4.1 and we tested plugin check_oracle_health.

In command line all of it running properly.

In web GUI we have the follows issue :

The check Availability is ok for the Current Status and Status Information like follows



DB SOAR2R Availability

OK 12-17-2018 12:07:15 2d 19h 6m 21s 1/4 OK - connection established to 10.51.153.24:1521/SOAR2R.



Current Status: OK
(for 2d 19h 15m 12s)
Status Information: OK - connection established to 10.51.152.24:1521/SOAR2C.
Performance Data:
Current Attempt: 1/3 (HARD state)
Last Check Time: 12-17-2018 12:11:54
Check Type: ACTIVE
Check Latency / Duration: 0,000 / 0,000 seconds
Next Scheduled Check: 12-17-2018 12:21:54
Last State Change: 12-14-2018 17:01:54
Last Notification: N/A (notification 2)
Is This Service Flapping? NO
(0,00% state change)
In Scheduled Downtime? NO

Last Update: 12-17-2018 12:16:56 ( 0d 0h 0m 10s ago)




Active Checks: ENABLED

Passive Checks: ENABLED

Obsessing: ENABLED

Notifications: ENABLED

Event Handler: ENABLED

Flap Detection: ENABLED



But all the others check has the Current Status UNKNOWN and the Status Information OK, like follows :




DB SOAR2R Connected Users

UNKNOWN 12-17-2018 12:07:56 3d 23h 45m 14s 4/4 OK - 165 connected users


DB SOAR2R Connection Time

UNKNOWN 12-17-2018 12:08:33 2d 19h 6m 35s 4/4 OK - 0.17 seconds to connect as NAGIOS



Current Status: UNKNOWN
(for 3d 23h 53m 22s)
Status Information: OK - 152 connected users
Performance Data: connected_users=152;300;500
Current Attempt: 4/4 (HARD state)
Last Check Time: 12-17-2018 12:17:56
Check Type: ACTIVE
Check Latency / Duration: 0,000 / 0,000 seconds
Next Scheduled Check: 12-17-2018 12:22:56
Last State Change: 12-13-2018 12:24:53
Last Notification: 12-17-2018 11:27:57 (notification 351)
Is This Service Flapping? NO
(0,00% state change)
In Scheduled Downtime? NO

Last Update: 12-17-2018 12:18:06 ( 0d 0h 0m 9s ago)




Active Checks: ENABLED

Passive Checks: ENABLED

Obsessing: ENABLED

Notifications: ENABLED

Event Handler: ENABLED

Flap Detection: ENABLED



The command.cfg file has the follows configuration :

define command {

command_name check_DB
command_line /usr/bin/env LD_LIBRARY_PATH=/usr/lib/oracle/12.1/client64/lib/ ORACLE_HOME=/usr/lib/oracle/12.1/client64 $USER1$/check_oracle_health $ARG1$
}


The server soar2dbpre-scan.cfg configuration are the follows :


# Define a service to check the availability of DB UTLR2R This working properly

define service {

use local-service ; Name of service template to use
host_name soar2dbpre-scan
service_description DB UTLR2R Availability
check_command check_DB! --connect '10.51.153.24:1521/UTLR2R' --username 'nagios' --password 'oradbmon' --mode tnsping
}

# Define a service to check the number of currently logged in This has the Current Status UNKNOWN
# users on the DB UTLR2R. Warning if > 300 users, critical
# if > 500 users.

define service {

use local-service ; Name of service template to use
host_name soar2dbpre-scan
service_description DB UTLR2R Connected Users
check_command check_DB! --connect '10.51.153.24:1521/UTLR2R' --username 'nagios' --password 'oradbmon' --mode connected-users --warning 300 --critical 500
}

# Define a service to check the number of seconds to connect as NAGIOS on DB UTLR2R This has the Current Status UNKNOWN
# on the local machine. Warning if > 1 second, critical if
# > 5 seconds.

define service {

use local-service ; Name of service template to use
host_name soar2dbpre-scan
service_description DB UTLR2R Connection Time
check_command check_DB! --connect '10.51.153.24:1521/UTLR2R' --username 'nagios' --password 'oradbmon' --mode connection-time --warning 1 --critical 5
}

We attach nagios.cfg and nagiog.debug log for the check DB UTLR2R Connected Users files .

Have you any suggest for to fix the issue ?


Thanks in advance

Best Regards


Walter Rottura
HP RDC Nagios Team

Re: nagios core check_oracle_health issue

Posted: Tue Dec 18, 2018 11:15 am
by lmiltchev
You will need to check with your Oracle admin to make sure that the user has sufficient privileges. I found some examples of how to set up a user with specific privileges, required for collecting the information from the database.

Code: Select all

create user nagios identified by oradbmon;
grant create session to nagios;
grant select any dictionary to nagios;
grant select on V_$SYSSTAT to nagios;
grant select on V_$INSTANCE to nagios;
grant select on V_$LOG to nagios;
grant select on SYS.DBA_DATA_FILES to nagios;
grant select on SYS.DBA_FREE_SPACE to nagios;
-- if somebody still uses Oracle 8.1.7...

Code: Select all

grant select on sys.dba_tablespaces to nagios;
grant select on dba_temp_files to nagios;
grant select on sys.v_$Temp_extent_pool to nagios;
grant select on sys.v_$TEMP_SPACE_HEADER  to nagios;
grant select on sys.v_$session to nagios;
Hope this helps.

Re: nagios core check_oracle_health issue

Posted: Wed Dec 19, 2018 5:34 am
by hprdcnagios
Hi,
thanks for your reply.

We configured the follows permission on all DB checked :

grant create session to nagios;
grant select any dictionary to nagios;
grant select on V_$SYSSTAT to nagios;
grant select on V_$INSTANCE to nagios;
grant select on V_$LOG to nagios;
grant select on SYS.DBA_DATA_FILES to nagios;
grant select on SYS.DBA_FREE_SPACE to nagios;


The issue didn't change , we have all check with Current Status UNKNOWN and the Status Information OK


DB SOAR2C Connected Users
UNKNOWN 12-19-2018 11:28:10 5d 23h 4m 28s 5/5 OK - 93 connected users

DB SOAR2C Connection Time
UNKNOWN 12-19-2018 11:28:16 5d 23h 3m 54s 5/5 OK - 0.17 seconds to connect as NAGIOS

DB UTLR2C Connected Users
UNKNOWN 12-19-2018 11:30:24 5d 23h 12m 42s 5/5 OK - select count(*) from gv$session where type = 'user': 22


DB UTLR2C Connection Time
UNKNOWN 12-19-2018 11:29:34 5d 23h 12m 8s 5/5 OK - 0.20 seconds to connect as NAGIOS


Tablespace DB SOAR2C All Free Space
UNKNOWN 12-19-2018 11:28:23 5d 23h 6m 34s 5/5 OK - tbs WSCERT_WLS has 100.00% free space left, tbs WSCERT_STB has 99.99% free space

Re: nagios core check_oracle_health issue

Posted: Wed Dec 19, 2018 10:18 am
by tgriep
There may be a compatibility issue with the Perl Modules, the Oracle Instant Client or the version of the plugin that could be causing the unknown status.
Try adding the following option to the check_oracle_health command which will connect to the Oracle Server using that method and it may return the correct status.

Code: Select all

--method sqlplus

Re: nagios core check_oracle_health issue

Posted: Wed Dec 19, 2018 10:39 am
by hprdcnagios
Hi,
we have added in the coomand --method sqlplus and now it's running all properly .

Issue is fixed

Thanks a lot

Walter

Re: nagios core check_oracle_health issue

Posted: Wed Dec 19, 2018 10:52 am
by tgriep
Your welcome. Glad that the change fixed the issue.