Service Check time out

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Service Check time out

Postby saffer » Sun Mar 11, 2018 1:41 pm

hi folks,

First post in many years that has me truly perplexed.

I am running Nagios 4.3.4, with 10500+ service checks and some 400 hosts on a SLES 12SP3 VM ware host.

Every now and then we get a Service check time out from about 1000+ services checks, and occasionally host check time outs.

The perplexing issue here, is there appears to be no resource issue as we have more than sufficient memory and cpu. I have played with service check timeouts, and increased then from 60 seconds to 180. This did assist in minimizing the issue, but it still occurs randomly. Kernel has been tuned as well as TCP buffers etc. The typical checks are not perl. Typically check_npre is the primary service checker we use. Other service checks such as check_tcp and check_ssh also fail with service check timeout.

I have many years of experience with Nagios, and have run environments with >20,000 service checks and 650+ hosts, and never seen this issue.

SO any thoughts. Sounds like a cooky Nagios version.

Posts: 2
Joined: Tue Dec 10, 2013 2:12 am

Re: Service Check time out

Postby eloyd » Mon Mar 12, 2018 9:30 am

I'd be more interested in your VMWare host resource usage. Can you look back at network/memory/processor utilization there and see if you were waiting for anything ? Maybe another machine that has higher resource priority?
Eric Loyd • • 844.240.EVER • @EricLoydI'm a Nagios Fanatic!
User avatar
Cool Title Here
Posts: 1905
Joined: Thu Sep 27, 2012 9:14 am
Location: Rochester, NY

Re: Service Check time out

Postby cdienger » Tue Mar 13, 2018 2:36 pm

Were you able to check the vmware resources suggested by @eloyd?

Another potential place to check would be any firewall devices that the traffic may go through. Perhaps the frequent icmp and tcp connections are getting flagged as potentially malicious behavior and dropped ?
User avatar
Support Tech
Posts: 1028
Joined: Tue Feb 07, 2017 11:26 am

Return to Nagios Core

Who is online

Users browsing this forum: No registered users and 4 guests