Receiving frequent tspace and disk alerts

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
plakshmi
Posts: 68
Joined: Thu Aug 30, 2012 12:32 pm

Receiving frequent tspace and disk alerts

Post by plakshmi »

Hello,

We are using Nagios XI 2012R2.9 on our XI server. We are seeing problem with one client server monitoring which has 100-130 services configured on it. We see frequent "Service check timed out" alerts for TSPACE and disk services. Attached are the log entries. We are using nagios plugins 1.4.13.

Oracle and disk checks are working fine on other servers which are being monitored.

Please let us know if you require any additional information we can provide with.
You do not have the required permissions to view the files attached to this post.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Receiving frequent tspace and disk alerts

Post by slansing »

It appears as though you are just hitting your warning thresholds, if your other oracle checks are fine it seems to me the next logical step would be to check this oracle db server itself for issues, either that or what is between XI and this server on the network, since it looks like you are dropping timeouts. A temporary fix would be to increase the "-t" value on these checks to try and account for the timeouts.
plakshmi
Posts: 68
Joined: Thu Aug 30, 2012 12:32 pm

Re: Receiving frequent tspace and disk alerts

Post by plakshmi »

We are not seeing these service check timeout on disk alerts anymore, just for table space alerts. We did not see any network issue between the nagios server and remote host.

We are using check_oracle_basic plugin for checking table spaces. There is no timeout value in this command definition.

check_oracle_basic --tablespace <SERVICE> <WARNING> <CRITICAL>

Today user faced a peculiar issue. After acknowledging a critical table space alert, he did not receive further alerts though some other table spaces on same database went to critical usage. User had to remove acknowledgement in order to receive the other table space alerts as well.

We are currently checking for nagios plugins to check for individual table spaces on a particular DB or show all table spaces in critical at one go. We would like to have any suggestions from your end.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Receiving frequent tspace and disk alerts

Post by slansing »

Acknowledging one service, unless the others are dependent on it, should not cause others to be acknowledged, did you actually witness the other services being actively acknowledged by nagios or are you just having trouble getting notifications from them, it could just be that they are misconfiguration. Or did they acknowledge the host? Are these each individual services in nagios or just one service running all of these checks?
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Receiving frequent tspace and disk alerts

Post by tmcdonald »

If this is just one service checking all of the tables, you can look into the following plugin:

http://exchange.nagios.org/directory/Pl ... ce/details

It will allow you to check the tables individually and alert/acknowledge individually as well.
Former Nagios employee
plakshmi
Posts: 68
Joined: Thu Aug 30, 2012 12:32 pm

Re: Receiving frequent tspace and disk alerts

Post by plakshmi »

I did not see supplying host information in the command execution. Should we perform nrpe checks on nagios server, via nrpe.cfg entries on the remote host.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Receiving frequent tspace and disk alerts

Post by tmcdonald »

For this plugin yes, you will need to use NRPE. Sorry for not mentioning that in my last post.
Former Nagios employee
plakshmi
Posts: 68
Joined: Thu Aug 30, 2012 12:32 pm

Re: Receiving frequent tspace and disk alerts

Post by plakshmi »

We have tested the plugin and it works. Please do not close this post, hold it for 2 days, until we check with user. We will confirm you.

We have like 50-70 table spaces per database. So this suits our monitoring as we can monitor all table spaces in a database collectively.

bash-3.2$ ./check_oracle_tablespace.sh -s 'ISRVE' -c 90
TABLESPACE CRITICAL: DAILY_ACT_C 90%; GRSDW_DATA 91%; RES_ANAL_NEW 91%; SYSAUX 90%; XSAT_DATA 90%
plakshmi
Posts: 68
Joined: Thu Aug 30, 2012 12:32 pm

Re: Receiving frequent tspace and disk alerts

Post by plakshmi »

One doubt, after acknowledging a critical alert, if any other table space goes above 90% do we get alert for it since service is in acknowledged state.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Receiving frequent tspace and disk alerts

Post by tmcdonald »

plakshmi wrote:One doubt, after acknowledging a critical alert, if any other table space goes above 90% do we get alert for it since service is in acknowledged state.
That's the downside to checking multiple tables in one service. If you acknowledge the service, it will not send alerts regardless of the tables' states.
Former Nagios employee
Locked