Avail.cgi hangs forever

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Avail.cgi hangs forever

Post by gormank »

This is a continuation of the thread I started a few weeks ago, and thought was resolved by restoring a backup.
https://support.nagios.com/forum/viewto ... 16&t=64000
The issue seems to have started around when we noticed the DB needed repair (and was).
I don't see this as a performance issue since it's also happening on the standby (002) NXI host that isn't running any checks. The standby is updated by restoring a backup of the primary NXI host. Normally, the reports page opens in a few seconds.
I disabled running the report when the reports page opens, but running any report starts avail.cgi and never exits until it's killed.
A profile was attached to the other case and the issue is identical. Here's a snippet of top showing the cgi running for 26 minutes.
Any ideas?

Code: Select all

[root@qa4am2mlnagx001 ~]# top -n 1
top - 17:04:44 up 16 days, 22:33,  1 user,  load average: 3.09, 4.01, 4.39
Tasks: 258 total,   4 running, 254 sleeping,   0 stopped,   0 zombie
%Cpu(s): 59.4 us,  3.1 sy,  0.0 ni, 37.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 65808944 total,  3870144 free,  2337852 used, 59600948 buff/cache
KiB Swap: 33554428 total, 33168892 free,   385536 used. 61060184 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
29284 apache    20   0  525424 367848 102408 R  93.8  0.6  26:14.60 avail.cgi
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Avail.cgi hangs forever

Post by gormank »

From a KB article on report performance, I tailed the logs below and ran the report.
Nothing was logged; the tail output is rom before the report was started.

Code: Select all

[root@qa4am2mlnagx002 ~]# tail -f /var/log/httpd/error_log /var/log/httpd/ssl_error_log
==> /var/log/httpd/error_log <==
Undefined identifier: mgmt near line 16 of /usr/share/snmp/mibs/rfc1213.mib
Did not find 'ifIndex' in module RFC1213-MIB (/usr/share/snmp/mibs/etherlike.mib)
Did not find 'transmission' in module RFC1213-MIB (/usr/share/snmp/mibs/etherlike.mib)
Unlinked OID in EtherLike-MIB: dot3 ::= { transmission 7 }
Undefined identifier: transmission near line 26 of /usr/share/snmp/mibs/etherlike.mib
Did not find 'zeroDotZero' in module SNMPv2-SMI (/usr/share/snmp/mibs/IP-MIB.txt)
Did not find 'zeroDotZero' in module SNMPv2-SMI (/usr/share/snmp/mibs/DISMAN-EVENT-MIB.txt)
Did not find 'zeroDotZero' in module SNMPv2-SMI (/usr/share/snmp/mibs/DISMAN-SCHEDULE-MIB.txt)
[Sun Nov 28 03:13:03.034356 2021] [mpm_prefork:notice] [pid 17205] AH00163: Apache/2.4.6 (Red Hat Enterprise Linux) OpenSSL/1.0.2k-fips PHP/5.4.16 configured -- resuming normal operations
[Sun Nov 28 03:13:03.034369 2021] [core:notice] [pid 17205] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'

==> /var/log/httpd/ssl_error_log <==
[Sun Nov 28 03:13:02.983554 2021] [ssl:warn] [pid 17205] AH01909: RSA certificate configured for qa4am2mlnagx002.m2mqa.local:443 does NOT include an ID which matches the server name
[Thu Dec 02 16:25:05.477694 2021] [:error] [pid 10371] [client 172.30.139.192:52513] PHP Warning:  ldap_bind(): Unable to bind to server: Invalid credentials in /usr/local/nagiosxi/html/includes/components/ldap_ad_integration/adLDAP/src/adLDAP.php on line 714, referer: https://qa4am2mlnagx002/nagiosxi/login.php?redirect=/nagiosxi/index.php%3f&noauth=1
[Thu Dec 02 16:25:12.478569 2021] [:error] [pid 10360] [client 172.30.139.192:52531] PHP Notice:  Undefined index: scheme in /usr/local/nagiosxi/html/includes/pageparts.inc.php on line 607, referer: https://qa4am2mlnagx002/nagiosxi/index.php?
[Thu Dec 02 16:25:27.240876 2021] [:error] [pid 1205] [client 172.30.139.192:52616] PHP Notice:  Undefined index: scheme in /usr/local/nagiosxi/html/includes/pageparts.inc.php on line 607, referer: https://qa4am2mlnagx002/nagiosxi/admin/
[Thu Dec 02 17:22:26.546588 2021] [:error] [pid 1205] [client 172.30.139.192:51645] PHP Warning:  ldap_bind(): Unable to bind to server: Can't contact LDAP server in /usr/local/nagiosxi/html/includes/components/ldap_ad_integration/adLDAP/src/adLDAP.php on line 714, referer: https://qa4am2mlnagx002/nagiosxi/login.php
[Thu Dec 02 17:41:41.030021 2021] [:error] [pid 10377] [client 172.30.139.192:54908] PHP Notice:  Undefined index: scheme in /usr/local/nagiosxi/html/includes/pageparts.inc.php on line 607, referer: https://qa4am2mlnagx002/nagiosxi/index.php
[Thu Dec 02 17:44:08.286170 2021] [:error] [pid 1203] [client 172.30.139.192:55353] PHP Notice:  Undefined index: scheme in /usr/local/nagiosxi/html/includes/pageparts.inc.php on line 607, referer: https://qa4am2mlnagx002/nagiosxi/
[Thu Dec 02 17:45:17.349660 2021] [:error] [pid 10371] [client 172.30.139.192:55630] PHP Notice:  Undefined index: scheme in /usr/local/nagiosxi/html/includes/pageparts.inc.php on line 607, referer: https://qa4am2mlnagx002/nagiosxi/
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Avail.cgi hangs forever

Post by pbroste »

Hello @gormank

Thanks for reaching out, the availability reporting is pulling from the 'nagios' logs. We want to make sure that the user that is logged in (running the reports) has enough permissions to view all availability data for all hosts and services. Verify this by logging in with the default 'nagiosadmin account and run the Availability Report.

Next verify that the logs exist: 'log_archive_path=/usr/local/nagios/var/archives/'. Review the size of the log directory, as it may be too large to run the report, and review the configuration for log rotation, and verify that cron service is running.

Code: Select all

systemctl status crond
Verify rotation with date stamped logs and total size on the nagios.log

Code: Select all

ls -la /usr/local/nagios/var/archives/
Let us know what you find,
Perry
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Avail.cgi hangs forever

Post by gormank »

I'm running the report as nagiosadmin and it hangs.
Cron is running and rotating nagios.log.
Archived logs aren't removed. I don't see this as an issue since nothing has ever automatically removed archived logs on any Nagios system I have, which is many.
The archive dir had ~2400 files so I removed all but the 2021 files--no change in the report hanging.

An interesting thing is that if I run the legacy availability report with defaults (hostgroups), it completes ok. If I run it on services, it hangs.

One thing to note is that when running the Available Reports, Availability report, I'm not selecting anything--it's running with defaults, and it hangs. If I select a host or hostgroup, it completes.

I removed all the archived logs on the standby host, and the report still hangs, so it isn't related to archived log files.

Strangely, I now see that the default (click the reports tab) report page times out. This started a few minutes ago on the primary and standby systems. Fun
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Avail.cgi hangs forever

Post by gormank »

If I kill avail.cgi when it's hanging, it shows the following, which suggests that it's trying to do more than parse logs, and failing to talk to the monitoring engine.

Availability Summary
Report covers from: 2021-12-02 20:31:06 to 2021-12-03 20:31:06
Availability data is not available when monitoring engine is not running.
User avatar
pbroste
Posts: 1288
Joined: Tue Jun 01, 2021 1:27 pm

Re: Avail.cgi hangs forever

Post by pbroste »

Hello @gormank

Thanks for following up with the details, when you state;
Availability data is not available when monitoring engine is not running.
This indicates that the 'nagios.service' is no longer running and next step we need to investigate what reason the service has stopped.

Please PM your updated system profile so we can see what is going on.

To send us your system profile.
  • Login to the Nagios XI GUI using a web browser.
  • Click the "Admin" > "System Profile" Menu
  • Click the "Download Profile" button
  • Save the profile.zip file and send via Private Message
Thanks,
Perry
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Avail.cgi hangs forever

Post by gormank »

The nagios service is running and reliable.
The issue is avail.cgi when run with defaults hangs forever until it's killed.
The profile is attached since sending PMs hang in the outbox. Please remove it from the post.

Code: Select all

# systemctl status nagios -l
● nagios.service - Nagios Core 4.4.6
   Loaded: loaded (/usr/lib/systemd/system/nagios.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2021-12-06 21:18:11 UTC; 33min ago
     Docs: https://www.nagios.org/documentation
  Process: 31414 ExecStopPost=/bin/rm -f /usr/local/nagios/var/rw/nagios.cmd (code=exited, status=0/SUCCESS)
  Process: 31404 ExecStop=/bin/kill -s TERM ${MAINPID} (code=exited, status=0/SUCCESS)
  Process: 31418 ExecStart=/usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
  Process: 31417 ExecStartPre=/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg (code=exited, status=0/SUCCESS)
 Main PID: 31420 (nagios)
   CGroup: /system.slice/nagios.service
           ├─ 5640 /usr/local/nagios/libexec/check_ping -H 172.30.134.124 -w 3000.0,80% -c 5000.0,100% -p 5
           ├─ 5641 /bin/ping -n -U -w 30 -c 5 172.30.134.124
           ├─ 5642 /usr/local/nagios/libexec/check_ping -H 172.28.133.118 -w 3000.0,80% -c 5000.0,100% -p 5
           ├─ 5643 /bin/ping -n -U -w 30 -c 5 172.28.133.118
           ├─ 5644 /usr/local/nagios/libexec/check_ping -H 172.28.132.168 -w 3000.0,80% -c 5000.0,100% -p 5
           ├─ 5645 /bin/ping -n -U -w 30 -c 5 172.28.132.168
           ├─ 5696 /usr/local/nagios/libexec/check_ping -H 172.28.132.212 -w 3000.0,80% -c 5000.0,100% -p 5
           ├─ 5697 /bin/ping -n -U -w 30 -c 5 172.28.132.212
           ├─ 5698 /usr/local/nagios/libexec/check_nrpe -H 172.30.135.77 --v2-packets-only -u -t 45 3 -c check_net_int
           ├─ 5716 /usr/bin/perl -w /usr/local/nagios/libexec/check_hp -H 172.28.132.114 -C sp1der -x cpqFcaHostCntlrStatus -t 30 -w cpqNicIfLogMapStatus=5
           ├─ 5717 /usr/bin/perl /usr/local/nagios/libexec/check_uws_connection3.pl -f uws.m2m.myvzw.com -i 69.78.82.131 -c -t 45 -p http://172.30.130.24:3128
           ├─ 5718 sh -c /usr/bin/curl --verbose --insecure --proxy http://172.30.130.24:3128 --header 'Host:uws.m2m.myvzw.com' --header 'Content-Type: text/xml;charset=UTF-8' --header 'SOAPAction: http://nphase.com/unifiedwebservice/v2/ISessionService/LogIn' --data @/usr/local/nagios/libexec/check_uws_connection.pl.request.xml https://69.78.82.131/api/v2/SessionService.svc 2>&1
           ├─ 5719 /usr/bin/curl --verbose --insecure --proxy http://172.30.130.24:3128 --header Host:uws.m2m.myvzw.com --header Content-Type: text/xml;charset=UTF-8 --header SOAPAction: http://nphase.com/unifiedwebservice/v2/ISessionService/LogIn --data @/usr/local/nagios/libexec/check_uws_connection.pl.request.xml https://69.78.82.131/api/v2/SessionService.svc
           ├─31420 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
           ├─31421 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           ├─31422 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           ├─31423 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           ├─31424 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
           └─31562 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg

Dec 06 21:46:19 qa4am2mlnagx001.m2mqa.local sudo[31671]:   nagios : TTY=unknown ; PWD=/tmp ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_mailq -w 2 -c 3 -t 10 --mailserver sendmail
Dec 06 21:46:36 qa4am2mlnagx001.m2mqa.local sudo[31948]:   nagios : TTY=unknown ; PWD=/tmp ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_mailq -w 2 -c 3 -t 10 --mailserver sendmail
Dec 06 21:47:46 qa4am2mlnagx001.m2mqa.local sudo[1050]:   nagios : TTY=unknown ; PWD=/tmp ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_mailq -w 2 -c 3 -t 10 --mailserver sendmail
Dec 06 21:47:51 qa4am2mlnagx001.m2mqa.local check_nrpe[1153]: Remote 172.30.130.50 does not support version 3/4 packets
Dec 06 21:48:13 qa4am2mlnagx001.m2mqa.local nagios[31423]: job 4029 (pid=990): read() returned error 11
Dec 06 21:50:37 qa4am2mlnagx001.m2mqa.local check_nrpe[4529]: Remote 172.29.130.54 does not support version 3/4 packets
Dec 06 21:50:37 qa4am2mlnagx001.m2mqa.local check_nrpe[4529]: Remote 172.29.130.54 accepted a version 2 packet
Dec 06 21:50:52 qa4am2mlnagx001.m2mqa.local sudo[4793]:   nagios : TTY=unknown ; PWD=/tmp ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_mailq -w 2 -c 3 -t 10 --mailserver sendmail
Dec 06 21:51:23 qa4am2mlnagx001.m2mqa.local sudo[5381]:   nagios : TTY=unknown ; PWD=/tmp ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_mailq -w 2 -c 3 -t 10 --mailserver sendmail
Dec 06 21:51:40 qa4am2mlnagx001.m2mqa.local sudo[5656]:   nagios : TTY=unknown ; PWD=/tmp ; USER=root ; COMMAND=/usr/local/nagios/libexec/check_mailq -w 2 -c 3 -t 10 --mailserver sendmail
Moderator's Note: The profile has been shared with the support team but has been removed from the public forum.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Avail.cgi hangs forever

Post by ssax »

What is the output of these commands:

Code: Select all

ls -lh /usr/local/nagios/var/archives
Run this command as root/sudo (and leave it running):

Code: Select all

tail -Fn0 /var/log/httpd/error_log /var/log/httpd/ssl_erorr_log /usr/local/nagiosxi/var/wkhtmltox.log /usr/local/nagiosxi/var/load_url.log
Then replicate the report failure and after it's failed please PM the full output of the still running tail command.

Check your PMs as well.
gormank
Posts: 1114
Joined: Tue Dec 02, 2014 12:00 pm

Re: Avail.cgi hangs forever

Post by gormank »

Nothing is logged in the files that exist when the report fails to run on either of the NXI hosts that have trouble with the report. I do see that the encoding of the archived log files changed a while back but don't suppose that's the issue.

Code: Select all

# ls -lh /usr/local/nagios/var/archives
total 657M
-rw-r--r-- 1 nagios nagios 163M Dec  4 00:00 nagios-12-04-2021-00.log
-rw-r--r-- 1 nagios nagios 164M Dec  5 00:00 nagios-12-05-2021-00.log
-rw-r--r-- 1 nagios nagios 165M Dec  6 00:00 nagios-12-06-2021-00.log
-rw-r--r-- 1 nagios nagios 166M Dec  7 00:00 nagios-12-07-2021-00.log

# tail -Fn0 /var/log/httpd/error_log /var/log/httpd/ssl_erorr_log /usr/local/nagiosxi/var/wkhtmltox.log /usr/local/nagiosxi/var/load_url.log
==> /var/log/httpd/error_log <==
tail: cannot open ‘/var/log/httpd/ssl_erorr_log’ for reading: No such file or directory

==> /usr/local/nagiosxi/var/wkhtmltox.log <==

==> /usr/local/nagiosxi/var/load_url.log <==
^C

Code: Select all

/usr/local/nagios/var/archives/nagios-07-29-2021-00.log: ASCII text, with very long lines
/usr/local/nagios/var/archives/nagios-07-30-2021-00.log: ASCII text, with very long lines
/usr/local/nagios/var/archives/nagios-07-31-2021-00.log: UTF-8 Unicode text, with very long lines
/usr/local/nagios/var/archives/nagios-08-01-2021-00.log: UTF-8 Unicode text, with very long lines
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Avail.cgi hangs forever

Post by ssax »

Please create a ticket for this and include a link back to this forum thread so we can get a remote session setup:

https://support.nagios.com/tickets/

Thank you!
Locked