We are working on a new platform and using nagios XI for monitoring. since we start testing monitoring with about 11 K active check, we realized that our hosts and services status update with delay. Sometimes delay reach to 12 minutes while nagios core show the latest and correct check result.
in below example we are 622 seconds behind. (this is a part of ndo2db debug log)
Code: Select all
[1560945052.589490] [002.0] [pid=89705] INSERT INTO nagios_hoststatus SET instance_id='1', host_object_id='6458', status_update_time=FROM_UNIXTIME(1560944430), output='PING OK - Packet loss = 0%, RTA = 46\.94 ms', long_output='', perfdata='rta=46\.94ms;400\.000000;500\.000000;0\.000000 pl=0%;3;100;0 AVL=100%;0', current_state='0'
https://support.nagios.com/kb/article.php?id=139
we are using ramdisk and gearmand with 9 workers.
nagios core version is 4.2.4
nagios version = 5.5.11
nagios server :
RAM=20 GB
COUs= 10 Core
SSD Disk
CentOS release 6.10 (Final)
data base is offloaded with this resource and configuration.
RAM=26 GB
CPUs= 16 Core
SSD disk
CentOS Linux release 7.5.1804 (Core) we tried to find out the problem and the cause but find nothing.
the only error log in messages log file is :
Code: Select all
Jun 19 15:42:58 Nagios-XI nagios: job 127 (pid=63189): read() returned error 11
Jun 19 15:42:58 Nagios-XI nagios: job 128 (pid=63194): read() returned error 11
Jun 19 15:43:02 Nagios-XI nagios: job 128 (pid=63355): read() returned error 11
Jun 19 15:43:04 Nagios-XI nagios: job 129 (pid=63393): read() returned error 11
Jun 19 15:43:07 Nagios-XI nagios: job 129 (pid=63413): read() returned error 11
Jun 19 15:43:10 Nagios-XI nagios: job 130 (pid=63445): read() returned error 11
Jun 19 15:43:13 Nagios-XI nagios: job 131 (pid=63494): read() returned error 11
Jun 19 15:43:13 Nagios-XI nagios: job 132 (pid=63502): read() returned error 11
Jun 19 15:43:20 Nagios-XI nagios: job 134 (pid=63572): read() returned error 11
Jun 19 15:43:22 Nagios-XI nagios: job 134 (pid=63603): read() returned error 11
Jun 19 15:43:22 Nagios-XI nagios: job 135 (pid=63600): read() returned error 11