NagiosXI crashes 2 to 3 times a week
Posted: Mon Jul 09, 2018 10:03 am
We have NagiosXI version 5.4.13 on Redhat Linux 7.4. I noticed apache running avail.cgi and is consuming system resources and resulting in crashing the Nagios Data Base. How do we address this issue, as it is impacting our system monitoring.
[root@usbbnag1001 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.4 (Maipo)
[root@usbbnag1001 ~]#
Output of TOP:
top - 07:58:45 up 2 days, 21:01, 2 users, load average: 39.77, 38.80, 38.14
Tasks: 537 total, 41 running, 496 sleeping, 0 stopped, 0 zombie
%Cpu(s): 98.3 us, 1.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 32781632 total, 303792 free, 30922944 used, 1554896 buff/cache
KiB Swap: 2097148 total, 1661656 free, 435492 used. 969900 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13221 apache 20 0 2863416 934764 37572 R 23.9 2.9 33:04.63 avail.cgi
5866 apache 20 0 2787536 855436 34552 R 23.6 2.6 28:00.73 avail.cgi
5917 apache 20 0 2780012 846988 33836 R 23.6 2.6 27:51.85 avail.cgi
output of journal -xe
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: early_timeout=0; exited_ok=1; wait
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: stdout line 01: UNABLE TO CONNECT
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: GLOBAL SERVICE EVENTHANDLER job 1257
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: early_timeout=0; exited_ok=1; wait
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: stdout line 01: UNABLE TO CONNECT
Jul 09 08:01:15 usbbnag1001 nagios[20295]: job 969 (pid=5308): read() returned error 1
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: Core Worker 20295: job 969 (pid=5308
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: CHECK job 969 from worker Core Worke
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: host=uswvesx1004; service=UCS_MEMO
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: early_timeout=1; exited_ok=0; wait
Jul 09 08:01:15 usbbnag1001 nagios[20290]: Warning: Check of service 'UCS_MEMORYARRAY'
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: Core Worker 20295: job 969 (pid=5308
Jul 09 08:01:16 usbbnag1001 nagios[20290]: SERVICE ALERT: usbbctxsf1002;HANDLES;WARNIN
Jul 09 08:01:17 usbbnag1001 nagios[20290]: wproc: GLOBAL SERVICE EVENTHANDLER job 1263
Jul 09 08:01:17 usbbnag1001 nagios[20290]: wproc: early_timeout=0; exited_ok=1; wait
Jul 09 08:01:17 usbbnag1001 nagios[20290]: wproc: stdout line 01: UNABLE TO CONNECT
Jul 09 08:01:17 usbbnag1001 nagios[20290]: SERVICE ALERT: usbbsqlm1001;HANDLES;
[root@usbbnag1001 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.4 (Maipo)
[root@usbbnag1001 ~]#
Output of TOP:
top - 07:58:45 up 2 days, 21:01, 2 users, load average: 39.77, 38.80, 38.14
Tasks: 537 total, 41 running, 496 sleeping, 0 stopped, 0 zombie
%Cpu(s): 98.3 us, 1.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 32781632 total, 303792 free, 30922944 used, 1554896 buff/cache
KiB Swap: 2097148 total, 1661656 free, 435492 used. 969900 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13221 apache 20 0 2863416 934764 37572 R 23.9 2.9 33:04.63 avail.cgi
5866 apache 20 0 2787536 855436 34552 R 23.6 2.6 28:00.73 avail.cgi
5917 apache 20 0 2780012 846988 33836 R 23.6 2.6 27:51.85 avail.cgi
output of journal -xe
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: early_timeout=0; exited_ok=1; wait
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: stdout line 01: UNABLE TO CONNECT
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: GLOBAL SERVICE EVENTHANDLER job 1257
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: early_timeout=0; exited_ok=1; wait
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: stdout line 01: UNABLE TO CONNECT
Jul 09 08:01:15 usbbnag1001 nagios[20295]: job 969 (pid=5308): read() returned error 1
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: Core Worker 20295: job 969 (pid=5308
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: CHECK job 969 from worker Core Worke
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: host=uswvesx1004; service=UCS_MEMO
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: early_timeout=1; exited_ok=0; wait
Jul 09 08:01:15 usbbnag1001 nagios[20290]: Warning: Check of service 'UCS_MEMORYARRAY'
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: Core Worker 20295: job 969 (pid=5308
Jul 09 08:01:16 usbbnag1001 nagios[20290]: SERVICE ALERT: usbbctxsf1002;HANDLES;WARNIN
Jul 09 08:01:17 usbbnag1001 nagios[20290]: wproc: GLOBAL SERVICE EVENTHANDLER job 1263
Jul 09 08:01:17 usbbnag1001 nagios[20290]: wproc: early_timeout=0; exited_ok=1; wait
Jul 09 08:01:17 usbbnag1001 nagios[20290]: wproc: stdout line 01: UNABLE TO CONNECT
Jul 09 08:01:17 usbbnag1001 nagios[20290]: SERVICE ALERT: usbbsqlm1001;HANDLES;