NagiosXI crashes 2 to 3 times a week

This board serves as an open discussion and support collaboration point for Nagios XI. NOTE: Nagios XI customers should use the Customer Support forum to obtain expedited support.

NagiosXI crashes 2 to 3 times a week

Postby khouryi » Mon Jul 09, 2018 10:03 am

We have NagiosXI version 5.4.13 on Redhat Linux 7.4. I noticed apache running avail.cgi and is consuming system resources and resulting in crashing the Nagios Data Base. How do we address this issue, as it is impacting our system monitoring.
[root@usbbnag1001 ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.4 (Maipo)
[root@usbbnag1001 ~]#

Output of TOP:
top - 07:58:45 up 2 days, 21:01, 2 users, load average: 39.77, 38.80, 38.14
Tasks: 537 total, 41 running, 496 sleeping, 0 stopped, 0 zombie
%Cpu(s): 98.3 us, 1.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 32781632 total, 303792 free, 30922944 used, 1554896 buff/cache
KiB Swap: 2097148 total, 1661656 free, 435492 used. 969900 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13221 apache 20 0 2863416 934764 37572 R 23.9 2.9 33:04.63 avail.cgi
5866 apache 20 0 2787536 855436 34552 R 23.6 2.6 28:00.73 avail.cgi
5917 apache 20 0 2780012 846988 33836 R 23.6 2.6 27:51.85 avail.cgi

output of journal -xe
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: early_timeout=0; exited_ok=1; wait
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: stdout line 01: UNABLE TO CONNECT
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: GLOBAL SERVICE EVENTHANDLER job 1257
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: early_timeout=0; exited_ok=1; wait
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: stdout line 01: UNABLE TO CONNECT
Jul 09 08:01:15 usbbnag1001 nagios[20295]: job 969 (pid=5308): read() returned error 1
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: Core Worker 20295: job 969 (pid=5308
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: CHECK job 969 from worker Core Worke
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: host=uswvesx1004; service=UCS_MEMO
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: early_timeout=1; exited_ok=0; wait
Jul 09 08:01:15 usbbnag1001 nagios[20290]: Warning: Check of service 'UCS_MEMORYARRAY'
Jul 09 08:01:15 usbbnag1001 nagios[20290]: wproc: Core Worker 20295: job 969 (pid=5308
Jul 09 08:01:16 usbbnag1001 nagios[20290]: SERVICE ALERT: usbbctxsf1002;HANDLES;WARNIN
Jul 09 08:01:17 usbbnag1001 nagios[20290]: wproc: GLOBAL SERVICE EVENTHANDLER job 1263
Jul 09 08:01:17 usbbnag1001 nagios[20290]: wproc: early_timeout=0; exited_ok=1; wait
Jul 09 08:01:17 usbbnag1001 nagios[20290]: wproc: stdout line 01: UNABLE TO CONNECT
Jul 09 08:01:17 usbbnag1001 nagios[20290]: SERVICE ALERT: usbbsqlm1001;HANDLES;
khouryi
 
Posts: 13
Joined: Wed Aug 30, 2017 2:28 pm

Re: NagiosXI crashes 2 to 3 times a week

Postby khouryi » Mon Jul 09, 2018 1:16 pm

I upgraded to Nagios XI 5.5.0 and it seems so far to have addressed the issue. I will keep monitoring to see how it performs overnight.
khouryi
 
Posts: 13
Joined: Wed Aug 30, 2017 2:28 pm

Re: NagiosXI crashes 2 to 3 times a week

Postby tmcdonald » Tue Jul 10, 2018 10:02 am

We'll leave this open for a bit then. Please let us know the results.
Former Nagios employee
tmcdonald
 
Posts: 9118
Joined: Mon Sep 23, 2013 8:40 am

Re: NagiosXI crashes 2 to 3 times a week

Postby khouryi » Wed Jul 11, 2018 11:19 am

The issue with Apache running avail.cgi is back. The upgrade did not fix this issue. If I leave avail.cgi running, after a while a number of them run and the server CPU is %100

1648 apache 20 0 2043884 1.143g 1.081g R 100.0 3.7 1:17.01 avail.cgi
khouryi
 
Posts: 13
Joined: Wed Aug 30, 2017 2:28 pm

Re: NagiosXI crashes 2 to 3 times a week

Postby khouryi » Wed Jul 11, 2018 11:59 am

The upgrade to Nagios 5.5.0 also negatively affected the reporting feature.
khouryi
 
Posts: 13
Joined: Wed Aug 30, 2017 2:28 pm

Re: NagiosXI crashes 2 to 3 times a week

Postby khouryi » Wed Jul 11, 2018 3:03 pm

More LOGs
PHP Warning: pg_set_client_encoding() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres7.inc.php on line 255
tail cmdsubsys.log
PROCESSED 0 COMMANDS
...........................................................
PROCESSED 0 COMMANDS
PHP Warning: Invalid argument supplied for foreach() in /usr/local/nagiosxi/html/includes/components/nagiosim/nagiosim.inc.php on line 491
PHP Warning: pg_set_client_encoding() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres7.inc.php on line 255
............................................................
PROCESSED 0 COMMANDS
PHP Warning: Invalid argument supplied for foreach() in /usr/local/nagiosxi/html/includes/components/nagiosim/nagiosim.inc.php on line 491
PHP Warning: pg_set_client_encoding() expects parameter 1 to be resource, boolean given in /usr/local/nagiosxi/html/db/adodb/drivers/adodb-postgres7.inc.php on line 255
khouryi
 
Posts: 13
Joined: Wed Aug 30, 2017 2:28 pm

Re: NagiosXI crashes 2 to 3 times a week

Postby jomann » Fri Jul 13, 2018 12:41 pm

That postgres error is resolved in Nagios XI 5.5.1 which was released yesterday. You should be able to get that resolved with upgrade.

As for the avail.cgi maxing out your CPU, unfortunately this report is very heavy in both processing and IO read since it reads the nagios log files directly to check for availability. When running it over a long period of time or on a lot of hosts/services it can start taking a lot of time. We are actively working on developing better reports to help make this (and other reports) faster in the future.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
jomann
Development Lead / Senior Developer
 
Posts: 556
Joined: Mon Apr 22, 2013 10:06 am
Location: Nagios Enterprises


Return to Nagios XI

Who is online

Users browsing this forum: codru67, Google [Bot] and 22 guests