Page 1 of 1

[Nagios-devel] host check strangeness - odd behavior in Nagios

Posted: Tue Jul 07, 2009 1:29 pm
by Guest
This is a multi-part message in MIME format.

------_=_NextPart_001_01C9FF0E.D911826A
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Greetings All,=20

I'm seeing a problem with our host check scheduling. There are two
major issues, I can't tell if they are symptoms of the same problem or
two separate issues. I've provided the configs and information that I
know to be applicable, if there's other pertinent information please let
me know, I'm more than happy to provide it. =20

First Here's my Nagios config:
Single Nagios box (no distributed setup)
64-bit RHEL 5.3
Nagios 3.1.2 (I upgraded from 3.0.6 to see if that would fix the issues)


Problem 1. Some host checks are getting *stuck* in scheduling queue.
When I look at the scheduling queue these hosts are always listed with
the 'last check' time the same as it's 'next check' time. See attached
screen shot (problem 1). They typically stay at the top of the queue
for an hour or two.

Host configuration for one of them:


define host {
host_name hostxxx
alias Oracle
use
srvhost-os-2000,srvhost-physical,srvhost-oracle,srvhost-non-production,s
rvhost-all
notification_period aperture
register 1
}

Applicable Templates:

define host {
name generic-host
check_period 24x7
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notifications_enabled 1
register 0
}


define host {
name generic-pnp
action_url
/pnp/index.php?host=3D$HOSTNAME$'
onmouseover=3D"get_g('$HOSTNAME$','_HOST_')" onmouseout=3D"clear_g()"
register 0
}


define host {
name srvhost-all
alias All Servers
check_command check-nt-alive
use generic-pnp,generic-host
max_check_attempts 3
check_interval 60
retry_interval 1
active_checks_enabled 1
passive_checks_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
contact_groups +servers
notification_interval 240
notification_period 24x7
notification_options d,u,r
notifications_enabled 1
register 0
}


define host {
name srvhost-non-production
alias Non production servers
hostgroups +SRV_Cls-non-production
check_interval 120
retry_interval 20
passive_checks_enabled 1
contact_groups +servers
notification_interval 480
notification_period workhours
notification_options d,u,r
notifications_enabled 1
register 0
}


define host {
name srvhost-oracle
alias Oracle servers
hostgroups +SRV_app-oracle
contact_groups +oracle
register 0
}


define host {
name srvhost-physical
alias Servers that are running
on physical hardware
hostgroups +SRV_platform-physical
register 0
}


define host {
name srvhost-os-2000
alias Servers running Windo

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]