Hi there
I'm looking for some help.
Over the last week my mail server and the machine monitoring it with
Nagios has crashed 3 times at the same time.
I'm not sure if it is the Nagios machine crashing and taking my mail
server with it somehow or the other way around.
In both situations i have seen increased load on my mail server, to the
point of nrpe sending me a socket timeout warning. Shortly after this
the machines become unusable and a hard-reboot is the only way to fix it.
When both machines crash (mailserver=Redhat 9, nagio=fedora), i've gone
to the console on both machines and they are both filled with messages
saying "status=0". This is on BOTH machines. At this point it does not
accept a login (you can still type, but it hangs once you put the
username in)
I'm running nrpe on the mailserver checking load, number of processes,
disk space etc. The only anamolous thing is that i run my own plugin
which i called check_ps which scans 'ps' for a given process (just so i
know postfix is actually running!).
I was wondering if anyone could confirm whether or not it is Nagios that
is crashing my machines???
Kind Regards
Jon
--
Jonathan Soong
Information Services
Institute of Medical and Veterinary Science (IMVS)
Email: [email protected]
Web : http://www.imvs.sa.gov.au
Tel : +61 8 82223095
Fax : +61 8 82223147
This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]