Page 1 of 1

NagiosXI crashes 2 to 3 times a week

Posted: Mon Jul 09, 2018 10:03 am
by khouryi
We have NagiosXI version 5.4.13 on Redhat Linux 7.4. I noticed apache running avail.cgi and is consuming system resources and resulting in crashing the Nagios Data Base. How do we address this issue, as it is impacting our system monitoring.
[root@usbbnag1001 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.4 (Maipo)
[root@usbbnag1001 ~]#

Output of TOP:
top - 07:58:45 up 2 days, 21:01, 2 users, load average: 39.77, 38.80, 38.14
Tasks: 537 total, 41 running, 496 sleeping, 0 stopped, 0 zombie
%Cpu(s): 98.3 us, 1.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 32781632 total, 303792 free, 30922944 used, 1554896 buff/cache
KiB Swap: 2097148 total, 1661656 free, 435492 used. 969900 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13221 apache 20 0 2863416 934764 37572 R 23.9 2.9 33:04.63 avail.cgi
5866 apache 20 0 2787536 855436 34552 R 23.6 2.6 28:00.73 avail.cgi
5917 apache 20 0 2780012 846988 33836 R 23.6 2.6 27:51.85 avail.cgi

output of journal -xe
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: early_timeout=0; exited_ok=1; wait
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: stdout line 01: UNABLE TO CONNECT
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: GLOBAL SERVICE EVENTHANDLER job 1257
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: early_timeout=0; exited_ok=1; wait
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: stdout line 01: UNABLE TO CONNECT
Jul 09 08:01:15 usbbnag1001 nagios[20295]: job 969 (pid=5308): read() returned error 1
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: Core Worker 20295: job 969 (pid=5308
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: CHECK job 969 from worker Core Worke
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: host=uswvesx1004; service=UCS_MEMO
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: early_timeout=1; exited_ok=0; wait
Jul 09 08:01:15 usbbnag1001 nagios[20290]: Warning: Check of service 'UCS_MEMORYARRAY'
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: Core Worker 20295: job 969 (pid=5308
Jul 09 08:01:16 usbbnag1001 nagios[20290]: SERVICE ALERT: usbbctxsf1002;HANDLES;WARNIN
Jul 09 08:01:17 usbbnag1001 nagios[20290]: wproc: GLOBAL SERVICE EVENTHANDLER job 1263
Jul 09 08:01:17 usbbnag1001 nagios[20290]: wproc: early_timeout=0; exited_ok=1; wait
Jul 09 08:01:17 usbbnag1001 nagios[20290]: wproc: stdout line 01: UNABLE TO CONNECT
Jul 09 08:01:17 usbbnag1001 nagios[20290]: SERVICE ALERT: usbbsqlm1001;HANDLES;

Re: NagiosXI crashes 2 to 3 times a week

Posted: Mon Jul 09, 2018 1:16 pm
by khouryi
I upgraded to Nagios XI 5.5.0 and it seems so far to have addressed the issue. I will keep monitoring to see how it performs overnight.

Re: NagiosXI crashes 2 to 3 times a week

Posted: Tue Jul 10, 2018 10:02 am
by tmcdonald
We'll leave this open for a bit then. Please let us know the results.

Re: NagiosXI crashes 2 to 3 times a week

Posted: Wed Jul 11, 2018 11:19 am
by khouryi
The issue with Apache running avail.cgi is back. The upgrade did not fix this issue. If I leave avail.cgi running, after a while a number of them run and the server CPU is %100

1648 apache 20 0 2043884 1.143g 1.081g R 100.0 3.7 1:17.01 avail.cgi

Re: NagiosXI crashes 2 to 3 times a week

Posted: Wed Jul 11, 2018 11:59 am
by khouryi
The upgrade to Nagios 5.5.0 also negatively affected the reporting feature.

Re: NagiosXI crashes 2 to 3 times a week

Posted: Wed Jul 11, 2018 3:03 pm
by khouryi
More LOGs
PHP Warning: pg_set_client_encoding() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres7.inc.php on line 255
tail cmdsubsys.log
PROCESSED 0 COMMANDS
...........................................................
PROCESSED 0 COMMANDS
PHP Warning: Invalid argument supplied for foreach() in /usr/local/nagiosxi/html/includes/components/nagiosim/nagiosim.inc.php on line 491
PHP Warning: pg_set_client_encoding() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres7.inc.php on line 255
............................................................
PROCESSED 0 COMMANDS
PHP Warning: Invalid argument supplied for foreach() in /usr/local/nagiosxi/html/includes/components/nagiosim/nagiosim.inc.php on line 491
PHP Warning: pg_set_client_encoding() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres7.inc.php on line 255

Re: NagiosXI crashes 2 to 3 times a week

Posted: Fri Jul 13, 2018 12:41 pm
by jomann
That postgres error is resolved in Nagios XI 5.5.1 which was released yesterday. You should be able to get that resolved with upgrade.

As for the avail.cgi maxing out your CPU, unfortunately this report is very heavy in both processing and IO read since it reads the nagios log files directly to check for availability. When running it over a long period of time or on a lot of hosts/services it can start taking a lot of time. We are actively working on developing better reports to help make this (and other reports) faster in the future.