Virtual servers reboot too fast
Posted: Wed Aug 04, 2021 2:00 pm
Hello pros,
It seems like some of my Windows virtual servers reboot so quickly they are down and back up again before Nagios notices. I have the NCPA cross-platform agent installed on all of them.
I first tried setting the host checks to occur every minute and setting the re-try value to 0 with a max number of checks of 1. The hosts are set for "immediate notification." That didn't help. I then looked to see if there was some kind of "nagios_xi_check_uptime" command, but I couldn't locate one. Is there a best practices guide to monitoring virtual servers? I've been going through the PDFs in the admin guide but I didn't see one. There is a lot there and I might have missed something.
This is becoming something of a problem, because while Nagios suppresses service checks when the host is down, if it never notices that the host went down, I can end up getting a bunch of warnings that a VM's services are "unknown" or critical, even though the server is booting up or even already allowing me to log in. I've tried setting the services check to every 5min, with 1 re-try every 2mins, but that didn't help either.
Any advice or pointers of what I can read to solve this would be appreciated!
It seems like some of my Windows virtual servers reboot so quickly they are down and back up again before Nagios notices. I have the NCPA cross-platform agent installed on all of them.
I first tried setting the host checks to occur every minute and setting the re-try value to 0 with a max number of checks of 1. The hosts are set for "immediate notification." That didn't help. I then looked to see if there was some kind of "nagios_xi_check_uptime" command, but I couldn't locate one. Is there a best practices guide to monitoring virtual servers? I've been going through the PDFs in the admin guide but I didn't see one. There is a lot there and I might have missed something.
This is becoming something of a problem, because while Nagios suppresses service checks when the host is down, if it never notices that the host went down, I can end up getting a bunch of warnings that a VM's services are "unknown" or critical, even though the server is booting up or even already allowing me to log in. I've tried setting the services check to every 5min, with 1 re-try every 2mins, but that didn't help either.
Any advice or pointers of what I can read to solve this would be appreciated!