[Nagios-devel] Re: [Nagios-users] Regarding Trends status after Network Outage

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Guest

[Nagios-devel] Re: [Nagios-users] Regarding Trends status after Network Outage

Post by Guest »

--=-LJz88nKviO24Yxr9PKBE
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi,

I have the same bug (nagios 1.2), in a race condition (after a host
reboot).

ssh down -> reboot -> host up -> ssh down -> ssh up

[1099042385] SERVICE ALERT: test;ssh;CRITICAL;SOFT;1;Connection refused
[1099042445] SERVICE ALERT: test;ssh;CRITICAL;SOFT;2;Socket timeout
[1099042525] SERVICE ALERT: test;ssh;CRITICAL;HARD;3;Socket timeout
[1099042715] HOST ALERT: test;DOWN;SOFT;1;CRITICAL
[1099042725] HOST ALERT: test;DOWN;SOFT;2;CRITICAL
[1099042735] HOST ALERT: test;DOWN;SOFT;3;CRITICAL
[1099042745] HOST ALERT: test;DOWN;SOFT;4;CRITICAL
[1099042755] HOST ALERT: test;DOWN;HARD;5;CRITICAL
[1099042755] SERVICE ALERT: test;ping;CRITICAL;HARD;1;CRITICAL
[1099042935] HOST ALERT: test;UP;HARD;1;PING OK
[1099042935] SERVICE ALERT: test;ping;OK;HARD;1;PING OK
[1099042945] SERVICE ALERT: test;ssh;CRITICAL;SOFT;1;Socket timeout
[1099043005] SERVICE ALERT: test;ssh;OK;SOFT;2;TCP OK

=3D=3D=3D=3D> BUG ssh is in CRITICAL HARD STATE, but OK is SOFT !!

[1099043265] SERVICE ALERT: test;ssh;CRITICAL;SOFT;1;Socket timeout
[1099043335] SERVICE ALERT: test;ssh;CRITICAL;SOFT;2;Socket timeout
[1099043395] SERVICE ALERT: test;ssh;CRITICAL;HARD;3;Socket timeout
[1099043475] HOST ALERT: test;DOWN;SOFT;1;CRITICAL
[1099043485] HOST ALERT: test;DOWN;SOFT;2;CRITICAL
[1099043495] HOST ALERT: test;DOWN;SOFT;3;CRITICAL
[1099043505] HOST ALERT: test;DOWN;SOFT;4;CRITICAL
[1099043515] HOST ALERT: test;DOWN;HARD;5;CRITICAL
[1099043565] SERVICE ALERT: test;ping;CRITICAL;HARD;1;CRITICAL
[1099043715] HOST ALERT: test;UP;HARD;1;PING OK
[1099043715] SERVICE ALERT: test;ping;OK;HARD;1;PING OK
[1099043745] SERVICE ALERT: test;ssh;CRITICAL;SOFT;1;Socket timeout
[1099043815] SERVICE ALERT: test;ssh;CRITICAL;SOFT;2;Socket timeout
[1099043865] SERVICE ALERT: test;ssh;OK;HARD;3;TCP OK

=3D=3D=3D=3D=3D> hier it's ok, because ssh goes up after 2 test

If you want look this bug in your nagios log file, you could use
my simple perl script (see attachment)

PS :
to use it

for i in nagios-*2004*
do
./mayday_bug_trends.pl $i
done

Regards

Le jeudi 02 d=E9cembre 2004 =E0 10:05 +0530, Nilesh a =E9crit :
> Dear All,
>=20
> I have noticed a strange behaviour of Trends in nagios.
> I'm using nagios-1.2
>=20
> When ever there is a network outage, It is updating information=20
> immediately for the same.
> After Recover of network connectivity all host check and service checks=20
> are getting checked and updating information
> for availability of hosts and services. But many times Trends keeps on=20
> continuin with either "HOST UNREACHABLE" status and services with=20
> "CRITICAL" status.
>=20
> In such cases when i reboots nagios server then it is recovering it ,=20
> but it is not a solution.
>=20
> So how to resolve this problem.
> What i want is, as soon as host &/OR service check get success after=20
> network outage, Trends Must get update immediately.
>=20
> Waiting For Reply
> With regards
>=20
> Linux Admin
>=20

--=20
Eric BOLLENGIER, Administrateur Syst=E8me - Poste 1325
SIGMA Informatique http://www.sigma.fr
3 rue Newton, BP 4127, 44241 La Chapelle sur Erdre Cedex
tel : 02.40.37.14.00

--=-LJz88nKviO24Yxr9PKBE
Content-Disposition: attachment; filename=mayday_bug_trends.pl
Content-Type: application/x-perl; name=mayday_bug_trends.pl
Content-Transfer-Encoding: 7bit

#!/usr/bin/perl -w

use strict ;

my $max_ok_soft_time = 3*60 ;
my (%alert, %alert_solve) ;

my $file = shift or die "Usage : $0 file" ;

open(FP, $file) or die "E : impossible d'ouvir $file $!";

while (my $l = )
{
if ($l =~ /^\[(\d+)\] SERVICE ALERT: ([^;]+);([^;]+);(CRITICAL|WARNING|UNKNOW);HARD;/o)
{
$alert{"$2:$3"} = $1 ;
}

if ($l =~ /^\[(\d+)\] SERVICE ALERT: ([^;]+);([^;]+);(OK|RECOVERY);HARD;/o)
{
if (defined $alert{"$2:$3"})
{
delete $alert{"$2:$3"} ;
}
}

if ($l =~ /^\[(\d+)\] SERVICE ALERT: ([^;]+);([^;]+);(OK|RECOVERY);SOFT;/o)
{
if (defined $alert{"$2:$3"})
{
printf "PB sur $2/$3 (%i min)\n", ($1 - $alert{"$2:$3"})/60 ;
$alert_solve{"$2:$3"} = $1 ;
}
}
}

close(FP) ;

#

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]
Locked