Check Scheduler oddness

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
User avatar
gwakem
Posts: 238
Joined: Mon Jan 23, 2012 2:02 pm
Location: Asheville, NC

Check Scheduler oddness

Post by gwakem »

XI 5.8.1
RHEL 7.7

Hi all,
The check scheduler on my prod XI server regularly clumps large amounts of checks out to the end of the range while leaving some portions without checks at all, see the attached screenshot. This happens on its own and sometimes smooths out, but often ends up like this again. Is this normal or an indication of a potential issue somewhere?
Screenshot(4).png
You do not have the required permissions to view the files attached to this post.
--
Griffin Wakem
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Check Scheduler oddness

Post by pbroste »

Hello Griffin,

Thanks for reaching out, there is a number of reasons that this could be happening. Want to take a look at the System Profile so we can see what is going on.

To send us your system profile.
  • Login to the Nagios XI GUI using a web browser.
  • Click the "Admin" > "System Profile" Menu
  • Click the "Download Profile" button
  • Save the profile.zip file and send via Private Message
Thanks,
Perry
User avatar
gwakem
Posts: 238
Joined: Mon Jan 23, 2012 2:02 pm
Location: Asheville, NC

Re: Check Scheduler oddness

Post by gwakem »

Done!
--
Griffin Wakem
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Check Scheduler oddness

Post by pbroste »

Hello @gwakem

Thanks for sending over the System Profile, reviewing we see that your 'check-host-alive' checks are timing out.
wproc: Core Worker 9844: job 129303 (pid=32402) timed out. Killing it
CHECK job 129303 from worker Core Worker 9844 timed out after 30.00s
Warning: Check of host 'mxxxxxxx-pxxxxx' timed out after 30.00 seconds
wproc: host=monitoring-pi00494; service=(null);
early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Warning: Check of host 'mxxxxxxxx-pxxxxxxx' timed out after 30.01 seconds
Want to increase the check timeouts:
check_icmp [options] [-H] host1 host2 hostN

Options:
-t
timeout value (seconds, currently 10)
check_ping -H <host_address> -w <wrta>,<wpl>% -c <crta>,<cpl>%
[-p packets] [-t timeout] [-4|-6]

Options:
-t, --timeout=INTEGER:<timeout state>
Seconds before connection times out (default: 10)
Optional ":<timeout state>" can be a state integer (0,1,2,3) or a state STRING

Code: Select all

vi /usr/local/nagios/etc/nagios.cfg
Service Check Timeout
Format: service_check_timeout=<seconds>
Example: service_check_timeout=60
Host Check Timeout
Format: host_check_timeout=<seconds>
Example: host_check_timeout=60

Code: Select all

vi /usr/local/ncpa/etc/ncpa.cfg
host_check_timeout=30
service_check_timeout=60
Bump the timeouts up by 60 seconds and then check to see how things look. Restart the ncpa_listener.service and nagios.service.

Let us know how things are looking,
Perry
Locked