Check Scheduler oddness

Post by **gwakem** » Thu Oct 28, 2021 8:31 am

XI 5.8.1
RHEL 7.7

Hi all,
The check scheduler on my prod XI server regularly clumps large amounts of checks out to the end of the range while leaving some portions without checks at all, see the attached screenshot. This happens on its own and sometimes smooths out, but often ends up like this again. Is this normal or an indication of a potential issue somewhere?

Screenshot(4).png

Post by **pbroste** » Thu Oct 28, 2021 3:16 pm

Hello Griffin,

Thanks for reaching out, there is a number of reasons that this could be happening. Want to take a look at the System Profile so we can see what is going on.

To send us your system profile.

Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and send via Private Message

Thanks,
Perry

Post by **gwakem** » Fri Oct 29, 2021 5:48 am

Done!

Post by **pbroste** » Fri Oct 29, 2021 2:51 pm

Hello @gwakem

Thanks for sending over the System Profile, reviewing we see that your 'check-host-alive' checks are timing out.

wproc: Core Worker 9844: job 129303 (pid=32402) timed out. Killing it
CHECK job 129303 from worker Core Worker 9844 timed out after 30.00s
Warning: Check of host 'mxxxxxxx-pxxxxx' timed out after 30.00 seconds
wproc: host=monitoring-pi00494; service=(null);
early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Warning: Check of host 'mxxxxxxxx-pxxxxxxx' timed out after 30.01 seconds

Want to increase the check timeouts:

check_icmp [options] [-H] host1 host2 hostN

Options:
-t
timeout value (seconds, currently 10)

check_ping -H <host_address> -w <wrta>,<wpl>% -c <crta>,<cpl>%
[-p packets] [-t timeout] [-4|-6]

Options:
-t, --timeout=INTEGER:<timeout state>
Seconds before connection times out (default: 10)
Optional ":<timeout state>" can be a state integer (0,1,2,3) or a state STRING

Code: Select all

vi /usr/local/nagios/etc/nagios.cfg

Service Check Timeout
Format: service_check_timeout=<seconds>
Example: service_check_timeout=60
Host Check Timeout
Format: host_check_timeout=<seconds>
Example: host_check_timeout=60

Code: Select all

vi /usr/local/ncpa/etc/ncpa.cfg

host_check_timeout=30
service_check_timeout=60

Bump the timeouts up by 60 seconds and then check to see how things look. Restart the ncpa_listener.service and nagios.service.

Let us know how things are looking,
Perry

Nagios Support Forum

Check Scheduler oddness

Check Scheduler oddness

Re: Check Scheduler oddness

Re: Check Scheduler oddness

Re: Check Scheduler oddness