[Nagios-devel] Feedback on Nagios

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[Nagios-devel] Feedback on Nagios

Post by Guest »

Hi guys,

Just a bit of feedback on Nagios and the issues we have and are running
into with it.
The issues I raise may be because I do not understand Nagios or its
plugins sufficiently
or I am trying to use it in a way it is not intended. I am more than
happy with any
constructive feedback or criticism on this.

I'd like to say first of all thanks for a really good product it does a
pretty darn good job
of presenting the status of a service type network very well.

We previously used HP OpenView Network Node Manager (NNM) exclusively,
however
it is designed specifically for monitoring network type devices and has
no real concept of
services and service dependencies.

We are in the initial phases of trying Nagios out (about 1-2 months).
It is important to note
that I have not tinkered with the internals other than a small change
for permissions. I have
been focused on getting it to monitor and report on things.

We continue to use NNM for network discovery and also network display
(drill down and
containers) as this is an area where Nagios is weak in comparison.

** Nagios has poor network discovery facilities - particularly Layer 2.
** Nagios does not have a network "drill down" type map ala NNM.
** Nagios can not handle really large layouts - does not fully draw the
map - have not
investigated this issue yet though.
** I have not yet found a way to have nagios understand complex network
dependencies.
We have a large number of redundant paths, and we can not draw
these correctly. This
is very probably a lack of understanding on my part though.

I know that there is nmap discovery, however it does not give you the
layer 2 type dependency
information that is important for correctly configuring dependencies.
Additionally this does
not discover SNMP variables for CPU, Memory, Storage, or Networking.
Now I understand
that there are a 1001 different SNMP MIB's out there, but there are some
rather obvious ones
that hold a fair amount of market share:

Cisco and HP for Networking equipment.
HOST MIBs (Covers MS Win2k, NetSNMP and others)
IF MIBs for network interfaces (IP & Layer 2) and routing.

** Nagios has poor SNMP discovery of common services on MIBs. Is this
really a problem though?
Perhaps this is the responsibility of individual deployments.

Now this isn't a major problem for us - I wrote discovery scripts in
perl that given a list of
hosts, will interrogate their SNMP services and provide all the goodies
- services, service dependencies,
service extended information (discussion later) but that lead us to the
next problem.

Currently with the first sweep of discovery (excluding networking type
queries) we ended up with around
300 hosts and 1500 services with checking of services every 5 minutes.
This absolutely hammered CPU of the
box it was on (Sun E250 Dual CPU and 2Gb Memory). This was okay, we
used the embedded perl option
and this got us to just under 100% utilisation. Yes this is an issue
with the plugins and I'll discuss this later.

One of the problems with the embedded perl is that it has a rather large
memory leak. This has to be reset
every four to five days as it creeps up to 3-400 Mb Ram. I know this is
being addressed in the next version,
but I do point it out. Nb. We use caching as well.

** Nagios Embedded perl (with caching) leaks memory a lot - work around
is in next version.

To compound this we use performance monitoring to feed data into RRD
tool for further processing.
We use RRD (RRDcgi is really neat) to provide historical trends and also
handle non-gauge type collections
such as counters. Admittedly we run this at maximum nice levels to
ensure it does not impact primary
data collection work.

** Nagios does not natively deal with counters - not really a Nagios
problem, just an observation. i.e. write your
own plugins (we have).
** Nagios does not natively collect data that can be graphically
displayed "out of the box" - again not really a problem
just an observation. Everyone can roll their own, but it would be
nice if something was provided.
(I can provide my simple prototype perl

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked