Receiving frequent tspace and disk alerts

plakshmi · Post by **plakshmi** » Thu Jun 19, 2014 10:00 am

Hello,

We are using Nagios XI 2012R2.9 on our XI server. We are seeing problem with one client server monitoring which has 100-130 services configured on it. We see frequent "Service check timed out" alerts for TSPACE and disk services. Attached are the log entries. We are using nagios plugins 1.4.13.

Oracle and disk checks are working fine on other servers which are being monitored.

Please let us know if you require any additional information we can provide with.

slansing · Post by **slansing** » Thu Jun 19, 2014 1:06 pm

It appears as though you are just hitting your warning thresholds, if your other oracle checks are fine it seems to me the next logical step would be to check this oracle db server itself for issues, either that or what is between XI and this server on the network, since it looks like you are dropping timeouts. A temporary fix would be to increase the "-t" value on these checks to try and account for the timeouts.

plakshmi · Post by **plakshmi** » Thu Jun 26, 2014 3:51 am

We are not seeing these service check timeout on disk alerts anymore, just for table space alerts. We did not see any network issue between the nagios server and remote host.

We are using check_oracle_basic plugin for checking table spaces. There is no timeout value in this command definition.

check_oracle_basic --tablespace <SERVICE> <WARNING> <CRITICAL>

Today user faced a peculiar issue. After acknowledging a critical table space alert, he did not receive further alerts though some other table spaces on same database went to critical usage. User had to remove acknowledgement in order to receive the other table space alerts as well.

We are currently checking for nagios plugins to check for individual table spaces on a particular DB or show all table spaces in critical at one go. We would like to have any suggestions from your end.

slansing · Post by **slansing** » Thu Jun 26, 2014 4:52 pm

Acknowledging one service, unless the others are dependent on it, should not cause others to be acknowledged, did you actually witness the other services being actively acknowledged by nagios or are you just having trouble getting notifications from them, it could just be that they are misconfiguration. Or did they acknowledge the host? Are these each individual services in nagios or just one service running all of these checks?

tmcdonald · Post by **tmcdonald** » Thu Jun 26, 2014 4:54 pm

If this is just one service checking all of the tables, you can look into the following plugin:

http://exchange.nagios.org/directory/Pl ... ce/details

It will allow you to check the tables individually and alert/acknowledge individually as well.

plakshmi · Post by **plakshmi** » Fri Jun 27, 2014 8:21 am

I did not see supplying host information in the command execution. Should we perform nrpe checks on nagios server, via nrpe.cfg entries on the remote host.

tmcdonald · Post by **tmcdonald** » Fri Jun 27, 2014 12:09 pm

For this plugin yes, you will need to use NRPE. Sorry for not mentioning that in my last post.

plakshmi · Post by **plakshmi** » Mon Jun 30, 2014 9:01 am

We have tested the plugin and it works. Please do not close this post, hold it for 2 days, until we check with user. We will confirm you.

We have like 50-70 table spaces per database. So this suits our monitoring as we can monitor all table spaces in a database collectively.

bash-3.2$ ./check_oracle_tablespace.sh -s 'ISRVE' -c 90
TABLESPACE CRITICAL: DAILY_ACT_C 90%; GRSDW_DATA 91%; RES_ANAL_NEW 91%; SYSAUX 90%; XSAT_DATA 90%

plakshmi · Post by **plakshmi** » Mon Jun 30, 2014 9:25 am

One doubt, after acknowledging a critical alert, if any other table space goes above 90% do we get alert for it since service is in acknowledged state.

tmcdonald · Post by **tmcdonald** » Mon Jun 30, 2014 11:27 am

plakshmi wrote:One doubt, after acknowledging a critical alert, if any other table space goes above 90% do we get alert for it since service is in acknowledged state.

That's the downside to checking multiple tables in one service. If you acknowledge the service, it will not send alerts regardless of the tables' states.

Nagios Support Forum

Receiving frequent tspace and disk alerts

Receiving frequent tspace and disk alerts

Re: Receiving frequent tspace and disk alerts

Re: Receiving frequent tspace and disk alerts

Re: Receiving frequent tspace and disk alerts

Re: Receiving frequent tspace and disk alerts

Re: Receiving frequent tspace and disk alerts

Re: Receiving frequent tspace and disk alerts

Re: Receiving frequent tspace and disk alerts

Re: Receiving frequent tspace and disk alerts

Re: Receiving frequent tspace and disk alerts