--0016364d2065ed797c04ab42e760
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
Hi Daniel. In my environment I have a lot of hosts that are down for a long
time. I can=B4t deal with this. One thing that should be clear is that I=B4=
m
using gearman and mod_gearman to make the checks. I have 9 workers (virtual
machines) to do the job. The central server, running Nagios 3.2.3, does not
execute any plugin. The central server is physical, with 8 CPUs, 4 GB ram,
running RHEL 5.4 64 bits. Thanks.
On Wed, Aug 24, 2011 at 11:37 AM, Daniel Wittenberg wrote:
> I noticed from the output you have a high amount of unknown and critical
> services. Are those taking a long time to timeout? What you might try,
> which I know isn=92t ideal, but removing certain checks that might be fai=
ling,
> like just start with host checks, and when those show good, add a few mor=
e
> services, few more, etc. until you notice the time going through the roof
> again. That might help figure out where your threshold is, and if there =
are
> certain checks that are causing issues. Is this a physical or virtual
> server?****
>
>
> Dan****
>
> ** **
>
> *From:* Rodney Ramos [mailto:rodneyra@gmail.com]
> *Sent:* Wednesday, August 24, 2011 9:26 AM
>
> *To:* Nagios Developers List
> *Subject:* Re: [Nagios-devel] Nagios and Gearman - huge environment
> performance problem****
>
> ** **
>
> Hi Sven. Thank you again. I=B4m pretty sure that my check interval is 15 =
min,
> for both, hosts and services. I=B4ve set this in the templates.cfg file (=
see
> below). I sending too the nagiostats output. I agree with you that if we
> divide 100 k checks / 15 min ~ 111 checks/sec, but the problem is that
> Nagios does not make these checks smoothly during the time. Thats the
> problem.
>
>
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> templates.cfg
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> define host{
> name generic-host
> ...
> check_interval 15
> ....
> }
>
> define service{
> name generic-service
> ...
> normal_check_interval 15
> ....
> }
>
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> nagiostats output
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> Nagios Stats 3.2.3
> Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
> Last Modified: 10-03-2010
> License: GPL
>
> CURRENT STATUS DATA
> ------------------------------------------------------
> Status File: /usr/local/nagios/var/status.dat
> Status File Age: 0d 0h 0m 17s
> Status File Version: 3.2.3
>
> Program Running Time: 0d 17h 43m 2s
> Nagios PID: 18854
> Used/High/Total Command Buffers: 0 / 0 / 4096
>
> Total Services: 68206
> Services Checked: 68206
> Services Scheduled: 68206
> Services Actively Checked: 68206
> Services Passively Checked: 0
> Total Service State Change: 0.000 / 43.880 / 2.774 %
> Active Service Latency: 40.671 / 503.137 / 234.919 sec
> Active Service Execution Time: 0.003 / 24.737 / 2.527 sec
> Active Service State Change: 0.000 / 43.880 / 2.774 %
> Active Services Last 1/5/15/60 min: 0 / 2897 / 35932 / 68206
> Passive Service Latency: 0.000 / 0.000 / 0.000 sec
> Passive Service State Change: 0.000 / 0.000 / 0.000 %
> Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
> Services Ok/Warn/Unk/Crit: 46943 / 56 / 7660 / 13547
> Services Flapping: 980
> Services In Downtime: 0
>
> Total Hosts: 34103
> Hosts Checked: 34103
> Hosts Scheduled: 34103
> Hosts Actively Checked: 34103
> Host Passively Checked: 0
> Total Host State Change:
...[email truncated]...
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: rodneyra@gmail.com