Page 1 of 2

Receiving frequent tspace and disk alerts

Posted: Thu Jun 19, 2014 10:00 am
by plakshmi
Hello,

We are using Nagios XI 2012R2.9 on our XI server. We are seeing problem with one client server monitoring which has 100-130 services configured on it. We see frequent "Service check timed out" alerts for TSPACE and disk services. Attached are the log entries. We are using nagios plugins 1.4.13.

Oracle and disk checks are working fine on other servers which are being monitored.

Please let us know if you require any additional information we can provide with.

Re: Receiving frequent tspace and disk alerts

Posted: Thu Jun 19, 2014 1:06 pm
by slansing
It appears as though you are just hitting your warning thresholds, if your other oracle checks are fine it seems to me the next logical step would be to check this oracle db server itself for issues, either that or what is between XI and this server on the network, since it looks like you are dropping timeouts. A temporary fix would be to increase the "-t" value on these checks to try and account for the timeouts.

Re: Receiving frequent tspace and disk alerts

Posted: Thu Jun 26, 2014 3:51 am
by plakshmi
We are not seeing these service check timeout on disk alerts anymore, just for table space alerts. We did not see any network issue between the nagios server and remote host.

We are using check_oracle_basic plugin for checking table spaces. There is no timeout value in this command definition.

check_oracle_basic --tablespace <SERVICE> <WARNING> <CRITICAL>

Today user faced a peculiar issue. After acknowledging a critical table space alert, he did not receive further alerts though some other table spaces on same database went to critical usage. User had to remove acknowledgement in order to receive the other table space alerts as well.

We are currently checking for nagios plugins to check for individual table spaces on a particular DB or show all table spaces in critical at one go. We would like to have any suggestions from your end.

Re: Receiving frequent tspace and disk alerts

Posted: Thu Jun 26, 2014 4:52 pm
by slansing
Acknowledging one service, unless the others are dependent on it, should not cause others to be acknowledged, did you actually witness the other services being actively acknowledged by nagios or are you just having trouble getting notifications from them, it could just be that they are misconfiguration. Or did they acknowledge the host? Are these each individual services in nagios or just one service running all of these checks?

Re: Receiving frequent tspace and disk alerts

Posted: Thu Jun 26, 2014 4:54 pm
by tmcdonald
If this is just one service checking all of the tables, you can look into the following plugin:

http://exchange.nagios.org/directory/Pl ... ce/details

It will allow you to check the tables individually and alert/acknowledge individually as well.

Re: Receiving frequent tspace and disk alerts

Posted: Fri Jun 27, 2014 8:21 am
by plakshmi
I did not see supplying host information in the command execution. Should we perform nrpe checks on nagios server, via nrpe.cfg entries on the remote host.

Re: Receiving frequent tspace and disk alerts

Posted: Fri Jun 27, 2014 12:09 pm
by tmcdonald
For this plugin yes, you will need to use NRPE. Sorry for not mentioning that in my last post.

Re: Receiving frequent tspace and disk alerts

Posted: Mon Jun 30, 2014 9:01 am
by plakshmi
We have tested the plugin and it works. Please do not close this post, hold it for 2 days, until we check with user. We will confirm you.

We have like 50-70 table spaces per database. So this suits our monitoring as we can monitor all table spaces in a database collectively.

bash-3.2$ ./check_oracle_tablespace.sh -s 'ISRVE' -c 90
TABLESPACE CRITICAL: DAILY_ACT_C 90%; GRSDW_DATA 91%; RES_ANAL_NEW 91%; SYSAUX 90%; XSAT_DATA 90%

Re: Receiving frequent tspace and disk alerts

Posted: Mon Jun 30, 2014 9:25 am
by plakshmi
One doubt, after acknowledging a critical alert, if any other table space goes above 90% do we get alert for it since service is in acknowledged state.

Re: Receiving frequent tspace and disk alerts

Posted: Mon Jun 30, 2014 11:27 am
by tmcdonald
plakshmi wrote:One doubt, after acknowledging a critical alert, if any other table space goes above 90% do we get alert for it since service is in acknowledged state.
That's the downside to checking multiple tables in one service. If you acknowledge the service, it will not send alerts regardless of the tables' states.