NagiosXI Zombie process troubles
Posted: Thu Aug 17, 2017 7:32 am
Hello,
EDIT**: Nagios is running on Redhat Enterprise 6.9 64bit (Manual Install)
We are currently on NagiosXI v5.4.8. Over the weekend we started having some
issues. The Nagios checks, both Active/Passive are very sporadic. Some of the active
checks haven't gone through in days.
When looking through the webgui everything looks normal under "process info" and
"performance", other than in performance instead of showing a few thousand
checks taking place every 5 minutes, it shows less than a thousand in
15 minutes.
Nothing unusual appears in the logs, even though the nagios processes that
remain seem really busy. When I do an strace I'm getting the following over and
over again. Different processes are doing the same thing, but with different
host/services which explains why they're busy and causing a high load on the system.
write(3, "job_id=209\0type=1\0command=/usr/bin/php
/usr/local/nagiosxi/scripts/handle_nagioscore_notification.php --notification-
type=service --contact=\"<ommited>\" --contactemail=\"<ommited>\" --
type=RECOVERY
--escalated=\"0\" --author=\"\" --comments=\"\" --
host=\"<ommited>\" --hostaddress=\"<ommited>\" --
hostalias=\"esappj223.uits.iupui.edu\" --
hostdisplayname=\"<ommited>\" --service=\"Apache Busy Workers\"
--hoststate=UP --hoststateid=0 --servicestate=OK --servicestateid=0 --
lastservi"..., 1085) = -1 EAGAIN (Resource temporarily unavailable)
write(3, "job_id=209\0type=1\0command=/usr/bin/php
/usr/local/nagiosxi/scripts/handle_nagioscore_notification.php --notification-
type=service --contact=\"<ommited>\" --contactemail=\"<ommited>\" --
type=RECOVERY
--escalated=\"0\" --author=\"\" --comments=\"\" --
host=\"<ommited>\" --hostaddress=\"<ommited>\" --
hostalias=\"<ommited>\" --
hostdisplayname=\"<ommited>\" --service=\"Apache Busy Workers\"
--hoststate=UP --hoststateid=0 --servicestate=OK --servicestateid=0 --
lastservi"..., 1085) = -1 EAGAIN (Resource temporarily unavailable)
Thanks,
Eric
EDIT**: Nagios is running on Redhat Enterprise 6.9 64bit (Manual Install)
We are currently on NagiosXI v5.4.8. Over the weekend we started having some
issues. The Nagios checks, both Active/Passive are very sporadic. Some of the active
checks haven't gone through in days.
When looking through the webgui everything looks normal under "process info" and
"performance", other than in performance instead of showing a few thousand
checks taking place every 5 minutes, it shows less than a thousand in
15 minutes.
Nothing unusual appears in the logs, even though the nagios processes that
remain seem really busy. When I do an strace I'm getting the following over and
over again. Different processes are doing the same thing, but with different
host/services which explains why they're busy and causing a high load on the system.
write(3, "job_id=209\0type=1\0command=/usr/bin/php
/usr/local/nagiosxi/scripts/handle_nagioscore_notification.php --notification-
type=service --contact=\"<ommited>\" --contactemail=\"<ommited>\" --
type=RECOVERY
--escalated=\"0\" --author=\"\" --comments=\"\" --
host=\"<ommited>\" --hostaddress=\"<ommited>\" --
hostalias=\"esappj223.uits.iupui.edu\" --
hostdisplayname=\"<ommited>\" --service=\"Apache Busy Workers\"
--hoststate=UP --hoststateid=0 --servicestate=OK --servicestateid=0 --
lastservi"..., 1085) = -1 EAGAIN (Resource temporarily unavailable)
write(3, "job_id=209\0type=1\0command=/usr/bin/php
/usr/local/nagiosxi/scripts/handle_nagioscore_notification.php --notification-
type=service --contact=\"<ommited>\" --contactemail=\"<ommited>\" --
type=RECOVERY
--escalated=\"0\" --author=\"\" --comments=\"\" --
host=\"<ommited>\" --hostaddress=\"<ommited>\" --
hostalias=\"<ommited>\" --
hostdisplayname=\"<ommited>\" --service=\"Apache Busy Workers\"
--hoststate=UP --hoststateid=0 --servicestate=OK --servicestateid=0 --
lastservi"..., 1085) = -1 EAGAIN (Resource temporarily unavailable)
Thanks,
Eric