Flapping

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
MPIvan
Posts: 213
Joined: Thu Nov 22, 2012 6:09 am

Flapping

Post by MPIvan »

Hi all,

I was using Nagios 3.5.0, but after i try to update and to get it back in normal function, i successfully destroy all the OS. :))) So i decide to uninstall and re install, from scratch. But the .cfg files i was able to backup and save them. So i install new Ubuntu Server 12.04.03 LTS and Nagios 3.5.1 with Plugins 1.4.15. After the successful installation, some .cfg files was added ( in nagios/etc/objects ) like routers.cfg where i have put all the routers. And i delete the template.cfg file and add my own ... and also with some other .cfg files was deleted and copied from old one or just added ( when i add i also put the privilege that is needed for nagios ... chown nagios:nagios and chmod ... ), and it works ok, but from time to time i have shown that all hosts ( not just routers, all hosts ) in the status area show me that they are down and services all up, and some are ok. And for some times all comes to normal. I was using Nagios 3.5.0 enough time, but i have never see this kind of situation where all hosts are down and services up.

Image
Image
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Flapping

Post by slansing »

What do the comments say on those hosts? And what is their status information?
MPIvan
Posts: 213
Joined: Thu Nov 22, 2012 6:09 am

Re: Flapping

Post by MPIvan »

Here it is ....
Image

It tells that hosts status is down but the real status is up .. i can access that host
and i dont known how can be active and passive checks enable ... should be one of them disable ???
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Flapping

Post by slansing »

Why are you using /bin/ping? You should be using the check_ping plugin from the plugins package you installed, and that should be located in /usr/local/nagios/libexec assuming you are running Centos/RHEL. I'd recommend trying the check_ping I mentioned above from the command line and seeing if you get the same output as it shows in the web interface, if you get a valid response that should be your solution.
MPIvan
Posts: 213
Joined: Thu Nov 22, 2012 6:09 am

Re: Flapping

Post by MPIvan »

I assuming that the ping was doing something .... i remove it from the host but they are still showing that is down and status information showing about the same error about /bin/ping ... and i dont know way it is using /bin/ping command i dont change nothing ... im using default settings and have nothing change ... so i found the ping command in the cgi.cfg file where point to /bin/ping (ping_syntax=/bin/ping -n -U -c 5 $HOSTADDRESS$) so how to change this now ... should i put just "ping_syntax=/$USER1$/check_ping or to put all path /usr/local/nagio/libexec" ??? i dont know who to change this ... and im using Ubuntu Server 12.04.03 LTs
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Flapping

Post by slansing »

You should already have a command defined that calls the ping check as this is a basic feature of nagios. Look inside your commands.cfg and you will find either check_ping or check_ICMP being used for the command definition "check-host-alive" , I would call that command for your host definitions so that they are used for the host-alive check.
MPIvan
Posts: 213
Joined: Thu Nov 22, 2012 6:09 am

Re: Flapping

Post by MPIvan »

I have those commands in the commands.cfg files

Code: Select all

################################################################################
#
# SAMPLE HOST CHECK COMMANDS
#
################################################################################


# This command checks to see if a host is "alive" by pinging it
# The check must result in a 100% packet loss or 5 second (5000ms) round trip
# average time to produce a critical error.
# Note: Five ICMP echo packets are sent (determined by the '-p 5' argument)

# 'check-host-alive' command definition
define command{
        command_name    check-host-alive
        command_line    $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
        }

# 'check-host-alive 2' command definition
define command{
        command_name    check_icmp
        command_line    $USER1$/check_icmp -H $HOSTADDRESS$ -w $ARG1$ -c $ARG2$ -p 2
}

Now do i have to comment ( put # ) the /bin/ping command in the cgi.cfg file ? And yes i have the "check_ping" and "check_icmp" in libexec directory ..
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Flapping

Post by slansing »

Then you should be calling check-host-alive as the command definition for your hosts and not trying to call /bin/ping. It looks like you have the RTA and packetloss % hardcoded into the command as well, you could switch this to use Arguments in case you need to alter it for systems with traditionally high network latency.
MPIvan
Posts: 213
Joined: Thu Nov 22, 2012 6:09 am

Re: Flapping

Post by MPIvan »

Im not trying to call /bin/ping ... im using check-host-alive .... now in this current situation im not using ping at all, and still showing me flapping and /bin/ping error. Here is what iv got for ping command and it is comment

Code: Select all

#define service{
#       use                     generic-service
#       hostgroup_name          router-bp
#       service_description     PING
#       check_command           check_ping!200.0,20%!600.0,60%
#       normal_check_interval   5
#       retry_check_interval    1
#}
It looks like you have the RTA and packetloss % hardcoded into the command as well, you could switch this to use Arguments in case you need to alter it for systems with traditionally high network latency.
Im not sure that i understand you here ... pls explain to me ....

I just install the Nagios and copy the router.cfg, template.cfg and that is all ( i put the right privilege and all needed staff ) i have not change any of the parameters ...
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Flapping

Post by slansing »

Sorry, I was talking about your warning and critical thresholds, your Round Trip Average time is the first number and is comma separated from the packet loss percentage.

Can you call check_ping from the command line manually and see if you receive the /bin/ping error? I'm not sure why the host you showed is showing "/bin/ping -n -U -w 30 -c 5 172.16.20.251" Unless that is how you have it defined in the command. Which host's details did you show below in the picture? Can you share the config file section for it?
Locked