Service Check interval

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
n8860104460
Posts: 32
Joined: Wed Jan 11, 2017 3:36 pm

Service Check interval

Post by n8860104460 »

Hi Team,

We have to setup one DB tablespace monitoring where interval check should be 55 Mins for specific service (Nagios will send next check to server after 55 Mins) Purpose for this delay is when we trying to fetch the data from Nagios server ( Using Command ) it is taking more than 45 Mins to show DB tablespace data.

We have make changes on check interval 55 Mins but still in GUI it’s not giving an output and from command its working fine.

Command we use to fetch the data.

$USER1$/check_oracle_health -t 15 --connect $HOSTADDRESS$:$_HOSTDBPORT$/$_HOSTDBNAME$ --username $_HOSTDBUSER$ --password '$_HOSTDBPASS$' --warning $_HOSTTSWARN$ --critical $_HOSTTSCRIT$ --mode tablespace-usage

Error :

(Service Check Timed Out On Worker: usa*********)

Status Details
Service State: Critical
Duration: 21h 54m 4s
Service Stability: Unchanging (stable)
Last Check: 2017-07-25 08:05:16
Next Check: 2017-07-25 09:00:16

Config. File

###############################################################################
# Service configuration file
#
# Created by: Nagios Core Config Manager 2.3.3
# Date: 2017-07-25 07:50:17
# Version: Nagios 3.x config file
#
# --- DO NOT EDIT THIS FILE BY HAND ---
# Nagios CCM will overwrite all manual settings during the next update if you
# would like to edit files manually, place them in the 'static' directory or
# import your configs into the CCM by placing them in the 'import' directory.
#
###############################################################################

define service {
host_name mc0300*****_testing
service_description Oracle DB tablespace usage
use xerox_service_prod
check_command xerox_common_db_oracle_tblspc_60_min!!!!!!!!
check_interval 55
register 1
}

###############################################################################
#
# Service configuration file
#
# END OF FILE
#
###############################################################################

Please advise
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Service Check interval

Post by mcapra »

For a check that is taking nearly an hour to properly return it's results, I would highly recommend you schedule it as a cron job and submit the results to Nagios XI as a passive check. More info on passive checks if you go that route:
https://assets.nagios.com/downloads/nag ... ios-XI.pdf

You might also make sure you are using the latest version of check_oracle_health as substantial performance improvements have been made in later versions:
https://labs.consol.de/nagios/check_ora ... index.html
Former Nagios employee
https://www.mcapra.com/
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Service Check interval

Post by dwhitfield »

I agree with @mcapra, but if you want to do it the way you are doing it, it seems to me the interval is not as big of an issue as the timeout. Can you attach the plugin you are using? I found several different links to check_oracle_health so I want to be sure we are using the correct one.

It may also be useful to see a profile to help determine why things are taking so long. Can you PM me your Profile? You can download it by going to Admin > System Config > System Profile and click the ***Download Profile*** button towards the top. If for whatever reason you *cannot* download the profile, please put the output of View System Info (5.3.4+, Show Profile if older) in the thread (that will at least get us some info). This will give us access to many of the logs we would otherwise ask for individually. If security is a concern, you can unzip the profile take out what you like, and then zip it up again. We may end up needing something you remove, but we can ask for that specifically.

After you PM the profile, please update this thread. Updating this thread is the only way for it to show back up on our dashboard.
n8860104460
Posts: 32
Joined: Wed Jan 11, 2017 3:36 pm

Re: Service Check interval

Post by n8860104460 »

profile.zip
Hello,

Please find attached plugin and system profile details, and see the more details below for DB instance hope this will give you an idea why Nagios server is taking that much time to fetch the details.

No. of Count tablespace on instance. = 430
load on DB.

load averages: 28.1, 29.1, 29.5; up 156+12:35:23 05:43:37
1343 processes: 1315 sleeping, 3 zombie, 2 stopped, 23 on cpu
CPU states: 62.8% idle, 24.6% user, 12.6% kernel, 0.0% iowait, 0.0% swap
Memory: 192G phys mem, 14G free mem, 38G total swap, 38G free swap
You do not have the required permissions to view the files attached to this post.
User avatar
tacolover101
Posts: 432
Joined: Mon Apr 10, 2017 11:55 am

Re: Service Check interval

Post by tacolover101 »

Please find attached plugin and system profile details, and see the more details below for DB instance hope this will give you an idea why Nagios server is taking that much time to fetch the details.
if its taking 55 minute sto return this sounds like an issue with your DB, not Nagios. if you're making a large query, remembe that SQL is single threaded so no matter how large your system is, throwing more resources at it will not solve it.
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: Service Check interval

Post by lmiltchev »

Correct me if I am wrong, but it seems like you are using modgearman.
Error :

(Service Check Timed Out On Worker: usa*********)
Is everything on the modgearman worker set up as on the server (plugin, command, service, environment, etc.)? Does the check take such a long time if you disable modgearman?

To help us troubleshoot the issue, you may want to temporarily increase the log verbousity on the worker, by setting:

Code: Select all

debug=3
in the "/etc/mod_gearman2/worker.conf" file, then restarting the worker process:

Code: Select all

service mod-gearman2-worker restart
Next, post the "/var/log/mod_gearman2/mod_gearman_worker.log" log after your check is run.
Be sure to check out our Knowledgebase for helpful articles and solutions!
n8860104460
Posts: 32
Joined: Wed Jan 11, 2017 3:36 pm

Re: Service Check interval

Post by n8860104460 »

upon checking gearmand logs found that gearmand is recheck the service in every MAX 300 sec. and due to this service check timeout error is coming.

and settings applied in service > check setting > check_interval = 60 Mins is not working due to gearmand re-check.

gearmand Log
[2017-08-02 12:08:39][109164][INFO ] timeout (300s) hit for servicecheck: mc03XXXX_ISRVE - mc0300XXXXX_ISRVE

can we increase gearmand check interval?
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Service Check interval

Post by tgriep »

I think the timeout is hard coded in Mod Gearman and cannot be changed in the configuration files.
What you could do is to create a service group in Nagios, add that service to that group and then add the service group to the Gearman Server localservicegroups option so that check will not be run by Gearman buy by nagios itself.
Take a look at this link for more details.
https://labs.consol.de/nagios/mod-gearm ... er_options
localservicegroups

sets a list of servicegroups which will not be executed by gearman. They are just passed through.
localservicegroups=name1,name2,name3
Be sure to check out our Knowledgebase for helpful articles and solutions!
n8860104460
Posts: 32
Joined: Wed Jan 11, 2017 3:36 pm

Re: Service Check interval

Post by n8860104460 »

Hi Team,

Finally issue has been resolved after changing the TIME OUT Value (4000) in gearmand file and now we are able to see Tablespace details in NAGIOS XI.

Thank you for all your suggestion and support.
bolson

Re: Service Check interval

Post by bolson »

Closing topic as resolved.

Thank you for using the Nagios Support Forum.
Locked