Latency Accessing Host/Service Status Details

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
User avatar
drcentner
Posts: 20
Joined: Tue Jun 14, 2016 11:39 am
Location: Virginia, USA

Latency Accessing Host/Service Status Details

Post by drcentner »

Users of our production Nagios XI environment with the "Can see all objects" and "Can access advanced features" permissions are experiencing high latency (25+ seconds) when accessing Host/Service Status Details pages. Unfortunately, affected users did not report the issue immediately. Based on the timeline of our previous upgrades, and the recollections of affected users, I believe this issue started on April 9th when we upgraded from 2014R2.7 to 5.2.7. We are currently on 5.2.9.

The full permission set of affected users:
  • Authorization Level: User
  • Can see all objects: Checked
  • Can (re)configure hosts and services: Unchecked
  • Can control all objects: Checked
  • Can see/control monitoring engine: Unchecked
  • Can access advanced features: Checked
  • Has read-only access: Unchecked
Troubleshooting steps taken:
  • Verified that the latency is affecting multiple users/workstations/browsers
  • Verified that the latency can be reproduced by masquerading as an effected user
  • Verified that the latency cannot be reproduced in our non-production Nagios XI environment, which is configured the same as the production environment.
  • Removed "Can see all objects" permission: Latency ceased, but these users need to be able to see all objects (NOC personnel).
  • Restored "Can see all objects" permission, and removed "Can control all objects": Latency returned.
  • Restored "Can control all objects" permission, and removed "Can access advanced features": Latency ceased, but these users need to be able to schedule downtime (NOC personnel).
  • Elevated Authorization Level to Admin: Latency ceased, but these users cannot have Admin permissions.
  • Restored original permissions: Latency returned.
  • Discovered that if an affected user clicks the Stop button in the web browser, most of the Status Detail content will appear. However graph and history data do not.
  • Verified Nagios XI server is not stressed: System Status page shows all green, and no admins/users with different permission sets are experiencing performance issues.
  • Added the XI server's hostname to /etc/hosts to the loopback line
  • Affected users do not experience latency when going directly to the Scheduled Downtime page
  • Running "host monitoringserver.domainname.edu" displays it's internal domain name and IP address (it is not externally accessible)
Tailed /var/log/http/error_log during troubleshooting. None of the errors were from clients of affected users. The only entries that occurred (each example below occurred multiple times):
[Fri Jul 01 10:39:51 2016] [error] [client 10.X.X.X] PHP Notice: Undefined offset: 1000 in /usr/local/nagiosxi/html/includes/components/helpsystem/helpsystem.inc.php on line 252
[Fri Jul 01 10:41:09 2016] [error] [client 10.X.X.X] PHP Notice: Undefined index: flash_msg in /usr/local/nagiosxi/html/login.php on line 79
Similar behavior as: https://support.nagios.com/forum/viewto ... 16&t=38445

Example of the latency viewed using the Firebug FireFox extention (~30 seconds):
nagios_latency_user.PNG
Viewing the same exact page as an admin (~5 seconds):
nagios_latency_admin.PNG
System Profile:
System:
Nagios XI Version : 5.2.9
Manual Install of XI
monitoringserver.domainname.edu 2.6.32-642.1.1.el6.x86_64 x86_64
Red Hat Enterprise Linux Server release 6.8 (Santiago)
Gnome is not installed

Apache Information
PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.84 Safari/537.36
Server Name: monitoringserver.domainname.edu
Server Address: 10.X.X.X
Server Port: 443

Proxy: no
Configurations:
LDAP/AD Authentication
Large Installation Tweaks
RAM Disk

Monitored Hosts: 1680
Monitored Services: 11849
You do not have the required permissions to view the files attached to this post.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Latency Accessing Host/Service Status Details

Post by Box293 »

We commonly see this issue is related to name resolution or router/firewall issues and appeared in XI 5.x.

Can you go to Admin > System Config > Manage System Config

What is in your "Program URL" and "External URL" fields?

Does your Program URL resolve to an internal IP address?

How are you accessing XI, using the "Program URL" or "External URL" ?

Please run this command to ensure a trouble shooting command I am going to use is installed:

Code: Select all

yum install -y bind-utils
Then from the command line execute:

Code: Select all

host xxx


Where xxx is the DNS address in the "Program URL field".

Can it resolve? Does it resolve to an internal or external IP address?

We've found that when you have XI published on a public IP address and you access it from an internal machine, it has to go through a router/firewall. There are a lot of ajax calls going on when you go to the Host/Service Status Details pages, we have seen firewalls limit the amount of connections which in turn slows down the page loads. A good diagnostic of this is to access XI from a machine in the same subnet as the XI server. If you don't experience the issue then it's likely you are being limited by the number of concurrent connections.

Does any of this help?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
drcentner
Posts: 20
Joined: Tue Jun 14, 2016 11:39 am
Location: Virginia, USA

Re: Latency Accessing Host/Service Status Details

Post by drcentner »

What is in your "Program URL" and "External URL" fields?
Program URL: https://internal.domain.name/nagiosxi/
External URL: blank (this host is not externally accessible)


Does your Program URL resolve to an internal IP address?
Yes


How are you accessing XI, using the "Program URL" or "External URL" ?
Program URL


Can it resolve? Does it resolve to an internal or external IP address?
Yes
Internal


We've found that when you have XI published on a public IP address and you access it from an internal machine, it has to go through a router/firewall. There are a lot of ajax calls going on when you go to the Host/Service Status Details pages, we have seen firewalls limit the amount of connections which in turn slows down the page loads. A good diagnostic of this is to access XI from a machine in the same subnet as the XI server. If you don't experience the issue then it's likely you are being limited by the number of concurrent connections.
Many networks, such as ours, have one or more routers and/or firewalls between internal workstations and internal servers. I used an SSH tunnel to our Nagios XI server, configured workstation browser proxy settings, and was able to confirm that accessing Nagios XI "locally" on the Nagios XI server results in normal latency for users with the affected permission sets. However, as far as the network is concerned, there is no difference between my administrative access to Nagios XI and our NOC's user-level access. Why would a router/firewall only cause latency for users with a specific, limited permission set? Does Nagios XI generate excessive Ajax calls for certain user level permissions, but not for admins?
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Latency Accessing Host/Service Status Details

Post by rkennedy »

Program URL: https://internal.domain.name/nagiosxi/
Does Nagios XI generate excessive Ajax calls for certain user level permissions, but not for admins?
Yes, there are AJAX calls made, but it shouldn't be relating to DNS. From the XI machine, what is the output of nslookup internal.domain.name? Is it resolving to 127.0.0.1 or the internal IP address? With the internal AJAX calls, it'll need to resolve to 127.0.0.1 or you could see these longer load times. If this is the case, please modify /etc/hosts accordingly and see if it resolves it.
Former Nagios Employee
User avatar
drcentner
Posts: 20
Joined: Tue Jun 14, 2016 11:39 am
Location: Virginia, USA

Re: Latency Accessing Host/Service Status Details

Post by drcentner »

The question was if Nagios XI generates excessive Ajax calls for certain user level permissions, but not for admins?


the XI machine, what is the output of nslookup internal.domain.name?
Returns it's internal IP address, which is a 10.* address.

The nslookup command does not utilize the /etc/hosts file.
From nslookup man page "Nslookup is a program to query Internet domain name servers."
http://bencane.com/2013/10/29/managing- ... -etchosts/
http://unix.ittoolbox.com/groups/techni ... sts-431201
etc

However, I confirmed that /etc/hosts has an entry for 127.0.0.1 with internal.domain.name, and that pinging internal.domain.name successfully reads the /etc/hosts file and pings 127.0.0.1.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Latency Accessing Host/Service Status Details

Post by ssax »

I did see an issue like this where it was the comments/acknowledgements that were taking so long to parse through.

Do you see any crashed tables messages in your /var/log/mysqld.log or /var/log/mariadb/mariadb.log?

I think this would be easier to identify the cause if we setup a remote session. Please send in an email to [email protected] with a descriptive subject and detailed body with a link back to this thread so that we can get a remote session setup.

Thank you
User avatar
drcentner
Posts: 20
Joined: Tue Jun 14, 2016 11:39 am
Location: Virginia, USA

Re: Latency Accessing Host/Service Status Details

Post by drcentner »

The last entry in /var/log/mysqld.log is from the most recent service restart on June 18th. Not seeing anything in there that is indicative of a database issue.

Emailed [email protected] as requested.

Thanks
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: Latency Accessing Host/Service Status Details

Post by ssax »

Received, locking as we'll continue in the ticket.
Locked