We are frequently getting "Service Check Timed Out" alert for DB checks. Please let us know how to resolve this issue.
***** Nagios Monitor XI Alert *****
Notification Type: PROBLEM
Service: TSPACE - XXXXX
Host: XXXXXXXXXXX
Address: XXXXXXXX
State: CRITICAL
Date/Time: Mon Dec 01 07:20:51 EST 2013
Additional Info:
(Service Check Timed Out)
Service Check Timed Out
Re: Service Check Timed Out
Can you run the checks from the cli?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Re: Service Check Timed Out
We are able to run checks through CLI. The service throws the "service checked timeout" error for a small duration of time like 4-5 mins and then again going to its normal state. We are observing that the errors are occurring quite frequently. Please help us in getting this resolved.
Re: Service Check Timed Out
Can you post an example of the command you run and its output, wrapped in code tags?
Former Nagios employee
Re: Service Check Timed Out
Below is the command.Did not gave actual argument values here intentionally.
$USER1$/check_oracle_basic --tablespace $ARG1$ $ARG2$ $ARG3$ $USER40$ $USER41$
Here is the output
01-30-2014 02:21:06 01-30-2014 02:30:26 0d 0h 9m 20s SERVICE CRITICAL (HARD) (Service Check Timed Out)
01-30-2014 02:30:26 01-30-2014 02:56:06 0d 0h 25m 40s SERVICE WARNING (HARD) WARNING - Oracle tablespace TBS_FTD_2011_Q4 - 95% used [ 451 of 8900 MB available ]
01-30-2014 02:56:06 01-30-2014 03:00:08 0d 0h 4m 2s SERVICE CRITICAL (HARD) (Service Check Timed Out)
$USER1$/check_oracle_basic --tablespace $ARG1$ $ARG2$ $ARG3$ $USER40$ $USER41$
Here is the output
01-30-2014 02:21:06 01-30-2014 02:30:26 0d 0h 9m 20s SERVICE CRITICAL (HARD) (Service Check Timed Out)
01-30-2014 02:30:26 01-30-2014 02:56:06 0d 0h 25m 40s SERVICE WARNING (HARD) WARNING - Oracle tablespace TBS_FTD_2011_Q4 - 95% used [ 451 of 8900 MB available ]
01-30-2014 02:56:06 01-30-2014 03:00:08 0d 0h 4m 2s SERVICE CRITICAL (HARD) (Service Check Timed Out)
Re: Service Check Timed Out
I wonder:
How is the health of the machine hosting the DB, and the network connections between your XI server and the DB?
If you goto Reports -> Alert Heatmap, for example, do you see other, broader problems happening alongside the periods when the checks time out?
Or, are any of the other checks on the DB server/VM (beyond the tablespace checks) timing out, or in problem states?
Just a couple of thoughts, I'm probably on the wrong track.
How is the health of the machine hosting the DB, and the network connections between your XI server and the DB?
If you goto Reports -> Alert Heatmap, for example, do you see other, broader problems happening alongside the periods when the checks time out?
Or, are any of the other checks on the DB server/VM (beyond the tablespace checks) timing out, or in problem states?
Just a couple of thoughts, I'm probably on the wrong track.
-
- -fno-stack-protector
- Posts: 4366
- Joined: Mon Nov 19, 2012 12:10 pm
Re: Service Check Timed Out
I would tend to agree, the network health, and potentially oracle server health may be the bigger issue. Also, how long is your timeout currently configured for with that plugin?
Nagios-Plugins maintainer exclusively, unless you have other C language bugs with open-source nagios projects, then I am happy to help! Please pm or use other communication to alert me to issues as I no longer track the forum.
Re: Service Check Timed Out
We are unable to open heatmaps for this server.Timeout is set to 60 sec..
Beyond db checks it happens for disks too randomly. Also we have around 90 services monitored for this server where we see this issue regularly for tspace.
We observed that it goes to warning state and then changes to service checked timeout as shown below.
02-02-2014 20:04:31 02-03-2014 00:00:00 0d 3h 55m 29s SERVICE WARNING (HARD) WARNING - Oracle tablespace TBS_FTD_2011_Q4 - 95% used [ 451 of 8900 MB available ]
02-03-2014 00:00:00 02-03-2014 02:15:33 0d 2h 15m 33s SERVICE WARNING (HARD) WARNING - Oracle tablespace TBS_FTD_2011_Q4 - 95% used [ 451 of 8900 MB available ]
02-03-2014 02:15:33 02-03-2014 02:34:53 0d 0h 19m 20s SERVICE CRITICAL (HARD) (Service Check Timed Out)
02-03-2014 02:34:53 02-03-2014 06:52:31 0d 4h 17m 38s SERVICE WARNING (HARD) WARNING - Oracle tablespace TBS_FTD_2011_Q4 - 95% used [ 451 of 8900 MB available ]
02-03-2014 17:35:38 02-03-2014 17:39:45 0d 0h 4m 7s SERVICE CRITICAL (HARD) (Service Check Timed Out)
02-03-2014 17:39:45 02-03-2014 20:00:37 0d 2h 20m 52s SERVICE WARNING (HARD) WARNING - Oracle tablespace TBS_FTD_2011_Q4 - 95% used [ 451 of 8900 MB available ]
02-03-2014 20:00:37 02-03-2014 20:04:48 0d 0h 4m 11s SERVICE CRITICAL (HARD) (Service Check Timed Out)
02-03-2014 20:04:48 02-04-2014 00:00:00 0d 3h 55m 12s SERVICE WARNING (HARD) WARNING - Oracle tablespace TBS_FTD_2011_Q4 - 95% used [ 451 of 8900 MB available ]
02-04-2014 00:00:00 02-04-2014 00:31:19 0d 0h 31m 19s SERVICE WARNING (HARD) WARNING - Oracle tablespace TBS_FTD_2011_Q4 - 95% used [ 451 of 8900 MB available
Beyond db checks it happens for disks too randomly. Also we have around 90 services monitored for this server where we see this issue regularly for tspace.
We observed that it goes to warning state and then changes to service checked timeout as shown below.
02-02-2014 20:04:31 02-03-2014 00:00:00 0d 3h 55m 29s SERVICE WARNING (HARD) WARNING - Oracle tablespace TBS_FTD_2011_Q4 - 95% used [ 451 of 8900 MB available ]
02-03-2014 00:00:00 02-03-2014 02:15:33 0d 2h 15m 33s SERVICE WARNING (HARD) WARNING - Oracle tablespace TBS_FTD_2011_Q4 - 95% used [ 451 of 8900 MB available ]
02-03-2014 02:15:33 02-03-2014 02:34:53 0d 0h 19m 20s SERVICE CRITICAL (HARD) (Service Check Timed Out)
02-03-2014 02:34:53 02-03-2014 06:52:31 0d 4h 17m 38s SERVICE WARNING (HARD) WARNING - Oracle tablespace TBS_FTD_2011_Q4 - 95% used [ 451 of 8900 MB available ]
02-03-2014 17:35:38 02-03-2014 17:39:45 0d 0h 4m 7s SERVICE CRITICAL (HARD) (Service Check Timed Out)
02-03-2014 17:39:45 02-03-2014 20:00:37 0d 2h 20m 52s SERVICE WARNING (HARD) WARNING - Oracle tablespace TBS_FTD_2011_Q4 - 95% used [ 451 of 8900 MB available ]
02-03-2014 20:00:37 02-03-2014 20:04:48 0d 0h 4m 11s SERVICE CRITICAL (HARD) (Service Check Timed Out)
02-03-2014 20:04:48 02-04-2014 00:00:00 0d 3h 55m 12s SERVICE WARNING (HARD) WARNING - Oracle tablespace TBS_FTD_2011_Q4 - 95% used [ 451 of 8900 MB available ]
02-04-2014 00:00:00 02-04-2014 00:31:19 0d 0h 31m 19s SERVICE WARNING (HARD) WARNING - Oracle tablespace TBS_FTD_2011_Q4 - 95% used [ 451 of 8900 MB available
-
- Posts: 7698
- Joined: Mon Apr 23, 2012 4:28 pm
- Location: Travelling through time and space...
Re: Service Check Timed Out
Do you have some sort of query limiting set for the user / listener you are using in your oracle checks? That would be set on the oracle system itself.
-
- Posts: 59
- Joined: Wed Apr 02, 2014 9:29 am
Re: Service Check Timed Out
Where do I set the timeout value for this message?: "(Service Check Timed Out)"
Also, how can I set a generic timeout in perl?
Also, how can I set a generic timeout in perl?