Page 1 of 1

Looking for help with Nagios Core 3.5.0 and Windows servers

Posted: Tue Oct 11, 2016 12:04 pm
by lvirden
We are running an old version of nagios (intentionally) and are seeing an issue with ping.

Most of the time the ping works fine. Occasionally we see a

[10-10-2016 21:45:21] SERVICE ALERT: mylocalserver;PING;CRITICAL;HARD;1;CRITICAL - 999.999.99.99: rta nan, lost 100%
[10-10-2016 21:45:51] SERVICE ALERT: mylocalserver;PING;OK;HARD;1;OK - 999.999.99.99: rta 2.694ms, lost 0%

This server is a windows server. Other windows servers occasionally (maybe once or twice a month) show the same behavior.

When we look at the Nagios details, it reports 100% up time - apparently ignoring these moments of lost information.

Someone locally has suggested that perhaps too many pings are occurring around the same time.

Does any of this sound familiar to anyone? We don't want to just start making changes willy nilly without some sort of confirmation that the direction makes sense - or if there is some place to begin - or if there's a document we could read to help us trouble shoot this.

People are frustrated by getting calls about server issues, but when they log into their server, there's no sign of an issue.

Help us, Obi-Wan...

Re: Looking for help with Nagios Core 3.5.0 and Windows serv

Posted: Tue Oct 11, 2016 12:56 pm
by dwhitfield
hi @lvirden,

Can you please go through https://assets.nagios.com/downloads/nag ... uning.html and let us know if the issue persists? Most of those issues are directed toward cpu performance, but if you've got too much of anything going on, then it might help the "too many pings" issue.

As for "too many pings" issue directly, yes, this can certainly cause an issue. How many checks is too many depends on the network (hardware, architecture, etc.) and the types of checks.

I know you said you had reasons for running 3.5.0, but you might want to take a look at the 4.2.0 release notes and see if you can find any reasons to update: https://www.nagios.org/news/2016/08/nag ... -released/ . There are several performance enhancements, in addition to security fixes.

Re: Looking for help with Nagios Core 3.5.0 and Windows serv

Posted: Tue Oct 11, 2016 2:17 pm
by lvirden
thank you - we will read over that recommended document.

The reason we are "stuck" at the old version is that newer versions changed interfaces being used and changing the numerous places using the old interface was costly.

Now that the expert of the setup has retired, the case for making changes becomes more difficult to consider, since the expertise needed to even consider the situation has gone down significantly.

Re: Looking for help with Nagios Core 3.5.0 and Windows serv

Posted: Tue Oct 11, 2016 3:09 pm
by dwhitfield
Actually, https://www.nagios.org/projects/nagios-core/history/4x/ is a better document for determining whether you should upgrade. There are a total of 5 security fixes in the 4.x series, on top of performance enhancements.

If you do decide to upgrade, you can use the instructions at https://assets.nagios.com/downloads/nag ... ading.html

Since your Core expert left, another option would be to purchase XI. You can demo XI for 60 days before making a decision. There are pre-built VMs at https://www.nagios.com/downloads/nagios-xi/ if you don't want to go to the hassle of installing yourself. If you can't or don't want to purchase XI, then we'll still be here to help with your Core issues. :)

Re: Looking for help with Nagios Core 3.5.0 and Windows serv

Posted: Wed Oct 12, 2016 6:34 am
by lvirden
Thank you so much for your help. Right now we just want to deal with the weird problem of a small number of our large group of windows servers reporting occasional ping failure when the server is running just fine.

Re: Looking for help with Nagios Core 3.5.0 and Windows serv

Posted: Wed Oct 12, 2016 9:50 am
by dwhitfield
What version(s) are the Windows servers? That will help us rule out anything going on on the Windows side.

Also, what are the specs of the Nagios Core machine? This will help us determine if load on the Nagios server could be the issue.

Thanks!