Virtual servers reboot too fast

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
waskinbas
Posts: 16
Joined: Wed Jun 02, 2021 1:59 pm

Virtual servers reboot too fast

Post by waskinbas »

Hello pros,

It seems like some of my Windows virtual servers reboot so quickly they are down and back up again before Nagios notices. I have the NCPA cross-platform agent installed on all of them.

I first tried setting the host checks to occur every minute and setting the re-try value to 0 with a max number of checks of 1. The hosts are set for "immediate notification." That didn't help. I then looked to see if there was some kind of "nagios_xi_check_uptime" command, but I couldn't locate one. Is there a best practices guide to monitoring virtual servers? I've been going through the PDFs in the admin guide but I didn't see one. There is a lot there and I might have missed something.

This is becoming something of a problem, because while Nagios suppresses service checks when the host is down, if it never notices that the host went down, I can end up getting a bunch of warnings that a VM's services are "unknown" or critical, even though the server is booting up or even already allowing me to log in. I've tried setting the services check to every 5min, with 1 re-try every 2mins, but that didn't help either.

Any advice or pointers of what I can read to solve this would be appreciated!
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Virtual servers reboot too fast

Post by pbroste »

Hello @waskinbas

Thanks for reaching out, sounds like you have looked into ncpa/nrdp active and passive checks and both are not able to detect the unexpected reboot.

Another option here would be to use the built-in Windows Event Log Configuration Wizard in Nagios XI. in Configure > Start Monitoring Now and search for Windows Event Log.

This Wizard does require NCPA to work but you can install both agents on the system with support article found here.


Can also manually set up checks using NSClient. This is a third-party plugin, the documentation for CheckEventLog is available here.

I see that there is a list in the plugin exchange as well.

Thanks,
Perry
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Virtual servers reboot too fast

Post by ssax »

Increasing the max_check_attempts or increasing the first_notification_delay on the host/service can help in cases such as these.

max_check_attempts helps with false positives, I would increase it.

EDIT: You would also make sure that the host check_interval is lower than the service checks in order for the suppression to work properly.

You can also set host_down_disable_service_checks=1 in your nagios.cfg and restart the nagios service to stop the service checks from even occurring if the host is in a problem state.

We usually don't recommend checking under 5 minutes unless it's a critical server or you have a small environment as having all of your checks at 1 minute check_intervals will have a performance impact which limits the total number of checks you'll be able to get into the system.
waskinbas
Posts: 16
Joined: Wed Jun 02, 2021 1:59 pm

Re: Virtual servers reboot too fast

Post by waskinbas »

Took me two weeks, but I finally got back to this issue!

After further testing, I ended up finding out that checks every minute with 0 retries and 1 max check was reliable after all. Since I have a small monitoring group, this is fine for my purposes, though I might look into using the Windows Event Viewer to detect uptime.

I do still get buried by service warnings after the host has come up. I tried the "host_down_disable_service_checks=1" under CCM -> Core Configs, but that didn't change anything. I left my service checks set for every 5mins, with 2 retries at 2min intervals. I'm going to see if maybe increasing the timeout on in the NCPA agent might help before I try increasing the max_check_attempts on the services. I want quick polling to catch spikes in resource usage, so maybe my being overzealous is causing this.

Everything works for now, thank you both for your advice.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Virtual servers reboot too fast

Post by ssax »

Did you restart the nagios service after making the host_down_disable_service_checks=1? Basically the host will need to be in a down state before the services for the functionality to work.
Locked