Page 1 of 1

[Nagios-devel] Re: [Nagios-users] Regarding Trends status after Network Outage

Posted: Thu Dec 02, 2004 2:05 am
by Guest
--=-LJz88nKviO24Yxr9PKBE
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi,

I have the same bug (nagios 1.2), in a race condition (after a host
reboot).

ssh down -> reboot -> host up -> ssh down -> ssh up

[1099042385] SERVICE ALERT: test;ssh;CRITICAL;SOFT;1;Connection refused
[1099042445] SERVICE ALERT: test;ssh;CRITICAL;SOFT;2;Socket timeout
[1099042525] SERVICE ALERT: test;ssh;CRITICAL;HARD;3;Socket timeout
[1099042715] HOST ALERT: test;DOWN;SOFT;1;CRITICAL
[1099042725] HOST ALERT: test;DOWN;SOFT;2;CRITICAL
[1099042735] HOST ALERT: test;DOWN;SOFT;3;CRITICAL
[1099042745] HOST ALERT: test;DOWN;SOFT;4;CRITICAL
[1099042755] HOST ALERT: test;DOWN;HARD;5;CRITICAL
[1099042755] SERVICE ALERT: test;ping;CRITICAL;HARD;1;CRITICAL
[1099042935] HOST ALERT: test;UP;HARD;1;PING OK
[1099042935] SERVICE ALERT: test;ping;OK;HARD;1;PING OK
[1099042945] SERVICE ALERT: test;ssh;CRITICAL;SOFT;1;Socket timeout
[1099043005] SERVICE ALERT: test;ssh;OK;SOFT;2;TCP OK

=3D=3D=3D=3D> BUG ssh is in CRITICAL HARD STATE, but OK is SOFT !!

[1099043265] SERVICE ALERT: test;ssh;CRITICAL;SOFT;1;Socket timeout
[1099043335] SERVICE ALERT: test;ssh;CRITICAL;SOFT;2;Socket timeout
[1099043395] SERVICE ALERT: test;ssh;CRITICAL;HARD;3;Socket timeout
[1099043475] HOST ALERT: test;DOWN;SOFT;1;CRITICAL
[1099043485] HOST ALERT: test;DOWN;SOFT;2;CRITICAL
[1099043495] HOST ALERT: test;DOWN;SOFT;3;CRITICAL
[1099043505] HOST ALERT: test;DOWN;SOFT;4;CRITICAL
[1099043515] HOST ALERT: test;DOWN;HARD;5;CRITICAL
[1099043565] SERVICE ALERT: test;ping;CRITICAL;HARD;1;CRITICAL
[1099043715] HOST ALERT: test;UP;HARD;1;PING OK
[1099043715] SERVICE ALERT: test;ping;OK;HARD;1;PING OK
[1099043745] SERVICE ALERT: test;ssh;CRITICAL;SOFT;1;Socket timeout
[1099043815] SERVICE ALERT: test;ssh;CRITICAL;SOFT;2;Socket timeout
[1099043865] SERVICE ALERT: test;ssh;OK;HARD;3;TCP OK

=3D=3D=3D=3D=3D> hier it's ok, because ssh goes up after 2 test

If you want look this bug in your nagios log file, you could use
my simple perl script (see attachment)

PS :
to use it

for i in nagios-*2004*
do
./mayday_bug_trends.pl $i
done

Regards

Le jeudi 02 d=E9cembre 2004 =E0 10:05 +0530, Nilesh a =E9crit :
> Dear All,
>=20
> I have noticed a strange behaviour of Trends in nagios.
> I'm using nagios-1.2
>=20
> When ever there is a network outage, It is updating information=20
> immediately for the same.
> After Recover of network connectivity all host check and service checks=20
> are getting checked and updating information
> for availability of hosts and services. But many times Trends keeps on=20
> continuin with either "HOST UNREACHABLE" status and services with=20
> "CRITICAL" status.
>=20
> In such cases when i reboots nagios server then it is recovering it ,=20
> but it is not a solution.
>=20
> So how to resolve this problem.
> What i want is, as soon as host &/OR service check get success after=20
> network outage, Trends Must get update immediately.
>=20
> Waiting For Reply
> With regards
>=20
> Linux Admin
>=20

--=20
Eric BOLLENGIER, Administrateur Syst=E8me - Poste 1325
SIGMA Informatique http://www.sigma.fr
3 rue Newton, BP 4127, 44241 La Chapelle sur Erdre Cedex
tel : 02.40.37.14.00

--=-LJz88nKviO24Yxr9PKBE
Content-Disposition: attachment; filename=mayday_bug_trends.pl
Content-Type: application/x-perl; name=mayday_bug_trends.pl
Content-Transfer-Encoding: 7bit

#!/usr/bin/perl -w

use strict ;

my $max_ok_soft_time = 3*60 ;
my (%alert, %alert_solve) ;

my $file = shift or die "Usage : $0 file" ;

open(FP, $file) or die "E : impossible d'ouvir $file $!";

while (my $l = )
{
if ($l =~ /^\[(\d+)\] SERVICE ALERT: ([^;]+);([^;]+);(CRITICAL|WARNING|UNKNOW);HARD;/o)
{
$alert{"$2:$3"} = $1 ;
}

if ($l =~ /^\[(\d+)\] SERVICE ALERT: ([^;]+);([^;]+);(OK|RECOVERY);HARD;/o)
{
if (defined $alert{"$2:$3"})
{
delete $alert{"$2:$3"} ;
}
}

if ($l =~ /^\[(\d+)\] SERVICE ALERT: ([^;]+);([^;]+);(OK|RECOVERY);SOFT;/o)
{
if (defined $alert{"$2:$3"})
{
printf "PB sur $2/$3 (%i min)\n", ($1 - $alert{"$2:$3"})/60 ;
$alert_solve{"$2:$3"} = $1 ;
}
}
}

close(FP) ;

#

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]