Page 1 of 1
Check Scheduler oddness
Posted: Thu Oct 28, 2021 8:31 am
by gwakem
XI 5.8.1
RHEL 7.7
Hi all,
The check scheduler on my prod XI server regularly clumps large amounts of checks out to the end of the range while leaving some portions without checks at all, see the attached screenshot. This happens on its own and sometimes smooths out, but often ends up like this again. Is this normal or an indication of a potential issue somewhere?
Screenshot(4).png
Re: Check Scheduler oddness
Posted: Thu Oct 28, 2021 3:16 pm
by pbroste
Hello Griffin,
Thanks for reaching out, there is a number of reasons that this could be happening. Want to take a look at the System Profile so we can see what is going on.
To send us your system profile.
- Login to the Nagios XI GUI using a web browser.
- Click the "Admin" > "System Profile" Menu
- Click the "Download Profile" button
- Save the profile.zip file and send via Private Message
Thanks,
Perry
Re: Check Scheduler oddness
Posted: Fri Oct 29, 2021 5:48 am
by gwakem
Done!
Re: Check Scheduler oddness
Posted: Fri Oct 29, 2021 2:51 pm
by pbroste
Hello
@gwakem
Thanks for sending over the System Profile, reviewing we see that your 'check-host-alive' checks are timing out.
wproc: Core Worker 9844: job 129303 (pid=32402) timed out. Killing it
CHECK job 129303 from worker Core Worker 9844 timed out after 30.00s
Warning: Check of host 'mxxxxxxx-pxxxxx' timed out after 30.00 seconds
wproc: host=monitoring-pi00494; service=(null);
early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Warning: Check of host 'mxxxxxxxx-pxxxxxxx' timed out after 30.01 seconds
Want to increase the check timeouts:
check_icmp [options] [-H] host1 host2 hostN
Options:
-t
timeout value (seconds, currently 10)
check_ping -H <host_address> -w <wrta>,<wpl>% -c <crta>,<cpl>%
[-p packets] [-t timeout] [-4|-6]
Options:
-t, --timeout=INTEGER:<timeout state>
Seconds before connection times out (default: 10)
Optional ":<timeout state>" can be a state integer (0,1,2,3) or a state STRING
Code: Select all
vi /usr/local/nagios/etc/nagios.cfg
Service Check Timeout
Format: service_check_timeout=<seconds>
Example: service_check_timeout=60
Host Check Timeout
Format: host_check_timeout=<seconds>
Example: host_check_timeout=60
host_check_timeout=30
service_check_timeout=60
Bump the timeouts up by 60 seconds and then check to see how things look. Restart the ncpa_listener.service and nagios.service.
Let us know how things are looking,
Perry