Re: [Nagios-devel] Nagios and Gearman - huge environment

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

Re: [Nagios-devel] Nagios and Gearman - huge environment

Post by Guest »

--0016364d2065ed797c04ab42e760
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Hi Daniel. In my environment I have a lot of hosts that are down for a long
time. I can=B4t deal with this. One thing that should be clear is that I=B4=
m
using gearman and mod_gearman to make the checks. I have 9 workers (virtual
machines) to do the job. The central server, running Nagios 3.2.3, does not
execute any plugin. The central server is physical, with 8 CPUs, 4 GB ram,
running RHEL 5.4 64 bits. Thanks.

On Wed, Aug 24, 2011 at 11:37 AM, Daniel Wittenberg wrote:

> I noticed from the output you have a high amount of unknown and critical
> services. Are those taking a long time to timeout? What you might try,
> which I know isn=92t ideal, but removing certain checks that might be fai=
ling,
> like just start with host checks, and when those show good, add a few mor=
e
> services, few more, etc. until you notice the time going through the roof
> again. That might help figure out where your threshold is, and if there =
are
> certain checks that are causing issues. Is this a physical or virtual
> server?****
>
>
> Dan****
>
> ** **
>
> *From:* Rodney Ramos [mailto:rodneyra@gmail.com]
> *Sent:* Wednesday, August 24, 2011 9:26 AM
>
> *To:* Nagios Developers List
> *Subject:* Re: [Nagios-devel] Nagios and Gearman - huge environment
> performance problem****
>
> ** **
>
> Hi Sven. Thank you again. I=B4m pretty sure that my check interval is 15 =
min,
> for both, hosts and services. I=B4ve set this in the templates.cfg file (=
see
> below). I sending too the nagiostats output. I agree with you that if we
> divide 100 k checks / 15 min ~ 111 checks/sec, but the problem is that
> Nagios does not make these checks smoothly during the time. Thats the
> problem.
>
>
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> templates.cfg
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> define host{
> name generic-host
> ...
> check_interval 15
> ....
> }
>
> define service{
> name generic-service
> ...
> normal_check_interval 15
> ....
> }
>
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> nagiostats output
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> Nagios Stats 3.2.3
> Copyright (c) 2003-2008 Ethan Galstad (www.nagios.org)
> Last Modified: 10-03-2010
> License: GPL
>
> CURRENT STATUS DATA
> ------------------------------------------------------
> Status File: /usr/local/nagios/var/status.dat
> Status File Age: 0d 0h 0m 17s
> Status File Version: 3.2.3
>
> Program Running Time: 0d 17h 43m 2s
> Nagios PID: 18854
> Used/High/Total Command Buffers: 0 / 0 / 4096
>
> Total Services: 68206
> Services Checked: 68206
> Services Scheduled: 68206
> Services Actively Checked: 68206
> Services Passively Checked: 0
> Total Service State Change: 0.000 / 43.880 / 2.774 %
> Active Service Latency: 40.671 / 503.137 / 234.919 sec
> Active Service Execution Time: 0.003 / 24.737 / 2.527 sec
> Active Service State Change: 0.000 / 43.880 / 2.774 %
> Active Services Last 1/5/15/60 min: 0 / 2897 / 35932 / 68206
> Passive Service Latency: 0.000 / 0.000 / 0.000 sec
> Passive Service State Change: 0.000 / 0.000 / 0.000 %
> Passive Services Last 1/5/15/60 min: 0 / 0 / 0 / 0
> Services Ok/Warn/Unk/Crit: 46943 / 56 / 7660 / 13547
> Services Flapping: 980
> Services In Downtime: 0
>
> Total Hosts: 34103
> Hosts Checked: 34103
> Hosts Scheduled: 34103
> Hosts Actively Checked: 34103
> Host Passively Checked: 0
> Total Host State Change:

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: rodneyra@gmail.com
Locked