NagiosXI crashes 2 to 3 times a week

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
khouryi
Posts: 13
Joined: Wed Aug 30, 2017 2:28 pm

NagiosXI crashes 2 to 3 times a week

Post by khouryi »

We have NagiosXI version 5.4.13 on Redhat Linux 7.4. I noticed apache running avail.cgi and is consuming system resources and resulting in crashing the Nagios Data Base. How do we address this issue, as it is impacting our system monitoring.
[root@usbbnag1001 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.4 (Maipo)
[root@usbbnag1001 ~]#

Output of TOP:
top - 07:58:45 up 2 days, 21:01, 2 users, load average: 39.77, 38.80, 38.14
Tasks: 537 total, 41 running, 496 sleeping, 0 stopped, 0 zombie
%Cpu(s): 98.3 us, 1.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 32781632 total, 303792 free, 30922944 used, 1554896 buff/cache
KiB Swap: 2097148 total, 1661656 free, 435492 used. 969900 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13221 apache 20 0 2863416 934764 37572 R 23.9 2.9 33:04.63 avail.cgi
5866 apache 20 0 2787536 855436 34552 R 23.6 2.6 28:00.73 avail.cgi
5917 apache 20 0 2780012 846988 33836 R 23.6 2.6 27:51.85 avail.cgi

output of journal -xe
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: early_timeout=0; exited_ok=1; wait
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: stdout line 01: UNABLE TO CONNECT
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: GLOBAL SERVICE EVENTHANDLER job 1257
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: early_timeout=0; exited_ok=1; wait
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: stdout line 01: UNABLE TO CONNECT
Jul 09 08:01:15 usbbnag1001 nagios[20295]: job 969 (pid=5308): read() returned error 1
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: Core Worker 20295: job 969 (pid=5308
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: CHECK job 969 from worker Core Worke
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: host=uswvesx1004; service=UCS_MEMO
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: early_timeout=1; exited_ok=0; wait
Jul 09 08:01:15 usbbnag1001 nagios[20290]: Warning: Check of service 'UCS_MEMORYARRAY'
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: Core Worker 20295: job 969 (pid=5308
Jul 09 08:01:16 usbbnag1001 nagios[20290]: SERVICE ALERT: usbbctxsf1002;HANDLES;WARNIN
Jul 09 08:01:17 usbbnag1001 nagios[20290]: wproc: GLOBAL SERVICE EVENTHANDLER job 1263
Jul 09 08:01:17 usbbnag1001 nagios[20290]: wproc: early_timeout=0; exited_ok=1; wait
Jul 09 08:01:17 usbbnag1001 nagios[20290]: wproc: stdout line 01: UNABLE TO CONNECT
Jul 09 08:01:17 usbbnag1001 nagios[20290]: SERVICE ALERT: usbbsqlm1001;HANDLES;
khouryi
Posts: 13
Joined: Wed Aug 30, 2017 2:28 pm

Re: NagiosXI crashes 2 to 3 times a week

Post by khouryi »

I upgraded to Nagios XI 5.5.0 and it seems so far to have addressed the issue. I will keep monitoring to see how it performs overnight.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: NagiosXI crashes 2 to 3 times a week

Post by tmcdonald »

We'll leave this open for a bit then. Please let us know the results.
Former Nagios employee
khouryi
Posts: 13
Joined: Wed Aug 30, 2017 2:28 pm

Re: NagiosXI crashes 2 to 3 times a week

Post by khouryi »

The issue with Apache running avail.cgi is back. The upgrade did not fix this issue. If I leave avail.cgi running, after a while a number of them run and the server CPU is %100

1648 apache 20 0 2043884 1.143g 1.081g R 100.0 3.7 1:17.01 avail.cgi
khouryi
Posts: 13
Joined: Wed Aug 30, 2017 2:28 pm

Re: NagiosXI crashes 2 to 3 times a week

Post by khouryi »

The upgrade to Nagios 5.5.0 also negatively affected the reporting feature.
khouryi
Posts: 13
Joined: Wed Aug 30, 2017 2:28 pm

Re: NagiosXI crashes 2 to 3 times a week

Post by khouryi »

More LOGs
PHP Warning: pg_set_client_encoding() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres7.inc.php on line 255
tail cmdsubsys.log
PROCESSED 0 COMMANDS
...........................................................
PROCESSED 0 COMMANDS
PHP Warning: Invalid argument supplied for foreach() in /usr/local/nagiosxi/html/includes/components/nagiosim/nagiosim.inc.php on line 491
PHP Warning: pg_set_client_encoding() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres7.inc.php on line 255
............................................................
PROCESSED 0 COMMANDS
PHP Warning: Invalid argument supplied for foreach() in /usr/local/nagiosxi/html/includes/components/nagiosim/nagiosim.inc.php on line 491
PHP Warning: pg_set_client_encoding() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres7.inc.php on line 255
jomann
Development Lead
Posts: 611
Joined: Mon Apr 22, 2013 10:06 am
Location: Nagios Enterprises

Re: NagiosXI crashes 2 to 3 times a week

Post by jomann »

That postgres error is resolved in Nagios XI 5.5.1 which was released yesterday. You should be able to get that resolved with upgrade.

As for the avail.cgi maxing out your CPU, unfortunately this report is very heavy in both processing and IO read since it reads the nagios log files directly to check for availability. When running it over a long period of time or on a lot of hosts/services it can start taking a lot of time. We are actively working on developing better reports to help make this (and other reports) faster in the future.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked