HIGH Host and service check latency

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
erkanerturk
Posts: 53
Joined: Wed Jan 16, 2019 4:35 am

HIGH Host and service check latency

Post by erkanerturk »

Hi

i have a server.
i have almost 30K checks in 15 minute interval. it is a VM. most of the checks are SNMP checks..

i see that host check execution time avg is 0.26 sec
but host check latency is avg 165 sec

similarly
service check exec time: 0.81 sec (avg) BUT
service check latency : 160 sec (avg)

i also noticed that, from time to time, last check times are not updated. i suspect that, this is because of this latency issue

how can i correct the problem

TIA
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: HIGH Host and service check latency

Post by ssax »

Please PM me a copy of your profile.zip, you can download it from Admin > System Profile by clicking the Download Profile button.

Additionally, please send the output of these commands:
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
This next command may fail, that's okay, not all systems run postgresql, send the output anyways:

Code: Select all

echo "SELECT relname as Table, pg_size_pretty(pg_total_relation_size(relid)) As Size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as ExternalSize FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;" | psql nagiosxi nagiosxi
erkanerturk
Posts: 53
Joined: Wed Jan 16, 2019 4:35 am

Re: HIGH Host and service check latency

Post by erkanerturk »

Hi

i could not upload our profile.zip because of our new organizational policy. sorry for that. i could not eliminate the organizational data

i have seen no errors in the mariadb.log file.
and i see at least %30 cpu idle time..
if you want me to send you a data, please specify

query results are the following

PostgreSQL Query Result:

Code: Select all

psql: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?



MYSQL Query result:
+--------------------------------------------+------------+
| Table                                      | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements                    |       0.00 |
| nagios_commands                            |       0.02 |
| nagios_commenthistory                      |      23.40 |
| nagios_comments                            |       0.00 |
| nagios_configfiles                         |       0.01 |
| nagios_configfilevariables                 |       0.01 |
| nagios_conninfo                            |       0.27 |
| nagios_contact_addresses                   |       0.00 |
| nagios_contact_notificationcommands        |       0.01 |
| nagios_contactgroup_members                |       0.01 |
| nagios_contactgroups                       |       0.00 |
| nagios_contactnotificationmethods          |     478.19 |
| nagios_contactnotifications                |     505.58 |
| nagios_contacts                            |       0.01 |
| nagios_contactstatus                       |       0.01 |
| nagios_customvariables                     |       1.36 |
| nagios_customvariablestatus                |       1.32 |
| nagios_dbversion                           |       0.00 |
| nagios_downtimehistory                     |       0.00 |
| nagios_eventhandlers                       |       0.20 |
| nagios_externalcommands                    |       0.01 |
| nagios_flappinghistory                     |       9.75 |
| nagios_host_contactgroups                  |       0.08 |
| nagios_host_contacts                       |       0.10 |
| nagios_host_parenthosts                    |       0.00 |
| nagios_hostchecks                          |       0.56 |
| nagios_hostdependencies                    |       0.00 |
| nagios_hostescalation_contactgroups        |       0.00 |
| nagios_hostescalation_contacts             |       0.00 |
| nagios_hostescalations                     |       0.00 |
| nagios_hostgroup_members                   |       0.08 |
| nagios_hostgroups                          |       0.00 |
| nagios_hosts                               |       0.46 |
| nagios_hoststatus                          |       0.97 |
| nagios_instances                           |       0.00 |
| nagios_logentries                          |    2428.10 |
| nagios_notifications                       |     426.48 |
| nagios_objects                             |       5.96 |
| nagios_processevents                       |       0.22 |
| nagios_programstatus                       |       0.00 |
| nagios_runtimevariables                    |       0.00 |
| nagios_scheduleddowntime                   |       0.00 |
| nagios_service_contactgroups               |       1.21 |
| nagios_service_contacts                    |       0.41 |
| nagios_service_parentservices              |       0.00 |
| nagios_servicechecks                       |       6.56 |
| nagios_servicedependencies                 |       0.00 |
| nagios_serviceescalation_contactgroups     |       0.00 |
| nagios_serviceescalation_contacts          |       0.00 |
| nagios_serviceescalations                  |       0.00 |
| nagios_servicegroup_members                |       0.00 |
| nagios_servicegroups                       |       0.00 |
| nagios_services                            |       7.61 |
| nagios_servicestatus                       |      15.91 |
| nagios_statehistory                        |     462.82 |
| nagios_systemcommands                      |       0.03 |
| nagios_timedeventqueue                     |       0.00 |
| nagios_timedevents                         |       0.00 |
| nagios_timeperiod_timeranges               |       0.03 |
| nagios_timeperiods                         |       0.01 |
| tbl_command                                |       0.06 |
| tbl_contact                                |       0.03 |
| tbl_contactgroup                           |       0.03 |
| tbl_contacttemplate                        |       0.03 |
| tbl_domain                                 |       0.03 |
| tbl_host                                   |       0.50 |
| tbl_hostdependency                         |       0.03 |
| tbl_hostescalation                         |       0.03 |
| tbl_hostextinfo                            |       0.03 |
| tbl_hostgroup                              |       0.03 |
| tbl_hosttemplate                           |       0.03 |
| tbl_info                                   |       0.17 |
| tbl_lnkContactToCommandHost                |       0.02 |
| tbl_lnkContactToCommandService             |       0.02 |
| tbl_lnkContactToContactgroup               |       0.02 |
| tbl_lnkContactToContacttemplate            |       0.02 |
| tbl_lnkContactToVariabledefinition         |       0.02 |
| tbl_lnkContactgroupToContact               |       0.02 |
| tbl_lnkContactgroupToContactgroup          |       0.02 |
| tbl_lnkContacttemplateToCommandHost        |       0.02 |
| tbl_lnkContacttemplateToCommandService     |       0.02 |
| tbl_lnkContacttemplateToContactgroup       |       0.02 |
| tbl_lnkContacttemplateToContacttemplate    |       0.02 |
| tbl_lnkContacttemplateToVariabledefinition |       0.02 |
| tbl_lnkHostToContact                       |       0.09 |
| tbl_lnkHostToContactgroup                  |       0.08 |
| tbl_lnkHostToHost                          |       0.02 |
| tbl_lnkHostToHostgroup                     |       0.02 |
| tbl_lnkHostToHosttemplate                  |       0.09 |
| tbl_lnkHostToVariabledefinition            |       0.08 |
| tbl_lnkHostdependencyToHost_DH             |       0.02 |
| tbl_lnkHostdependencyToHost_H              |       0.02 |
| tbl_lnkHostdependencyToHostgroup_DH        |       0.02 |
| tbl_lnkHostdependencyToHostgroup_H         |       0.02 |
| tbl_lnkHostescalationToContact             |       0.02 |
| tbl_lnkHostescalationToContactgroup        |       0.02 |
| tbl_lnkHostescalationToHost                |       0.02 |
| tbl_lnkHostescalationToHostgroup           |       0.02 |
| tbl_lnkHostgroupToHost                     |       0.06 |
| tbl_lnkHostgroupToHostgroup                |       0.02 |
| tbl_lnkHosttemplateToContact               |       0.02 |
| tbl_lnkHosttemplateToContactgroup          |       0.02 |
| tbl_lnkHosttemplateToHost                  |       0.02 |
| tbl_lnkHosttemplateToHostgroup             |       0.02 |
| tbl_lnkHosttemplateToHosttemplate          |       0.02 |
| tbl_lnkHosttemplateToVariabledefinition    |       0.02 |
| tbl_lnkServiceToContact                    |       0.14 |
| tbl_lnkServiceToContactgroup               |       0.31 |
| tbl_lnkServiceToHost                       |       1.52 |
| tbl_lnkServiceToHostgroup                  |       0.02 |
| tbl_lnkServiceToServicegroup               |       0.02 |
| tbl_lnkServiceToServicetemplate            |       1.48 |
| tbl_lnkServiceToVariabledefinition         |       0.42 |
| tbl_lnkServicedependencyToHost_DH          |       0.02 |
| tbl_lnkServicedependencyToHost_H           |       0.02 |
| tbl_lnkServicedependencyToHostgroup_DH     |       0.02 |
| tbl_lnkServicedependencyToHostgroup_H      |       0.02 |
| tbl_lnkServicedependencyToService_DS       |       0.02 |
| tbl_lnkServicedependencyToService_S        |       0.02 |
| tbl_lnkServicedependencyToServicegroup_DS  |       0.02 |
| tbl_lnkServicedependencyToServicegroup_S   |       0.02 |
| tbl_lnkServiceescalationToContact          |       0.02 |
| tbl_lnkServiceescalationToContactgroup     |       0.02 |
| tbl_lnkServiceescalationToHost             |       0.02 |
| tbl_lnkServiceescalationToHostgroup        |       0.02 |
| tbl_lnkServiceescalationToService          |       0.02 |
| tbl_lnkServiceescalationToServicegroup     |       0.02 |
| tbl_lnkServicegroupToService               |       0.02 |
| tbl_lnkServicegroupToServicegroup          |       0.02 |
| tbl_lnkServicetemplateToContact            |       0.02 |
| tbl_lnkServicetemplateToContactgroup       |       0.02 |
| tbl_lnkServicetemplateToHost               |       0.02 |
| tbl_lnkServicetemplateToHostgroup          |       0.02 |
| tbl_lnkServicetemplateToServicegroup       |       0.02 |
| tbl_lnkServicetemplateToServicetemplate    |       0.02 |
| tbl_lnkServicetemplateToVariabledefinition |       0.02 |
| tbl_lnkTimeperiodToTimeperiod              |       0.02 |
| tbl_logbook                                |       0.27 |
| tbl_mainmenu                               |       0.02 |
| tbl_permission                             |       0.02 |
| tbl_permission_inactive                    |       0.02 |
| tbl_service                                |       7.52 |
| tbl_servicedependency                      |       0.03 |
| tbl_serviceescalation                      |       0.03 |
| tbl_serviceextinfo                         |       0.03 |
| tbl_servicegroup                           |       0.03 |
| tbl_servicetemplate                        |       0.03 |
| tbl_session                                |       0.02 |
| tbl_session_locks                          |       0.02 |
| tbl_settings                               |       0.03 |
| tbl_submenu                                |       0.02 |
| tbl_timedefinition                         |       0.06 |
| tbl_timeperiod                             |       0.03 |
| tbl_user                                   |       0.03 |
| tbl_variabledefinition                     |       1.52 |
| xi_auditlog                                |    3198.27 |
| xi_auth_tokens                             |      35.70 |
| xi_cmp_ccm_backups                         |       0.02 |
| xi_cmp_favorites                           |       0.03 |
| xi_cmp_nagiosbpi_backups                   |       0.06 |
| xi_cmp_trapdata                            |       0.16 |
| xi_cmp_trapdata_log                        |       0.03 |
| xi_commands                                |       0.05 |
| xi_deploy_agents                           |       0.02 |
| xi_deploy_jobs                             |       0.02 |
| xi_eventqueue                              |       0.03 |
| xi_events                                  |       3.31 |
| xi_incidents                               |       0.00 |
| xi_meta                                    |      41.06 |
| xi_mibs                                    |       0.05 |
| xi_options                                 |       0.03 |
| xi_sessions                                |       0.03 |
| xi_sysstat                                 |       0.03 |
| xi_usermeta                                |       3.84 |
| xi_users                                   |       0.08 |
+--------------------------------------------+------------+
erkanerturk
Posts: 53
Joined: Wed Jan 16, 2019 4:35 am

Re: HIGH Host and service check latency

Post by erkanerturk »

Hi I have noticed something

monitoring engine status > monitoring engine check statistics counters decreased and then showed 0. at the same time, i have noticed that last check time stopped at 10.44 and stayed 30 minutes with that value.
from linux cli, i see that nagios continues to do checks but in the gui last check times stayed the same..

should i increase npcd load threshold?
please advice..
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: HIGH Host and service check latency

Post by ssax »

Please send me your /usr/local/nagios/etc/nagios.cfg.

What XI version are you running? You can find it on the bottom left hand side of the web interface.

Are you seeing any errors in /var/log/messages, /var/log/http/error_log, /var/log/httpd/ssl_error_log, or /var/log/dmesg?

Include the output of these commands as root:

Code: Select all

sar
ps aux
ulimit -a
su -s /bin/bash -c 'ulimit -a' nagios
su -s /bin/bash -c 'ulimit -a' mysql
Attach your /etc/php.ini file as well.
erkanerturk
Posts: 53
Joined: Wed Jan 16, 2019 4:35 am

Re: HIGH Host and service check latency

Post by erkanerturk »

Hi
I have sent the files via PM

waiting for your reply..
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: HIGH Host and service check latency

Post by ssax »

Please edit your

Code: Select all

/etc/php.ini
and change these:

Code: Select all

max_execution_time = 60
max_input_vars = 5000
memory_limit = 256M
To these:

Code: Select all

max_execution_time = 600
max_input_vars = 50000
memory_limit = 1024M
Then restart apache:

Code: Select all

systemctl restart httpd
Then take the attached zip file, transfer it to your XI server, and run these commands as root against it:
- Or you can upgrade to XI 5.8.1 and it will update your NDO3 as well

Code: Select all

unzip ndo-master.zip
cd ndo-master
./configure
make all
make install
If you have an offloaded database or changed the default MySQL passwords you will need to edit your /usr/local/nagios/etc/ndo.cfg file and update these before running the next command to start it up:
- You can get the info from your from /usr/local/nagiosxi/html/config.inc.php for the ndoutils database

Code: Select all

db_host
db_port
db_user
db_pass
Then restart the nagios service:

Code: Select all

systemctl restart nagios
Then apply configuration and see if that alleviates the issue.

If it doesn't, please create a ticket for this and include a link back to this forum thread so we can get a remote session setup:

https://support.nagios.com/tickets/
You do not have the required permissions to view the files attached to this post.
erkanerturk
Posts: 53
Joined: Wed Jan 16, 2019 4:35 am

Re: HIGH Host and service check latency

Post by erkanerturk »

Hi

firstly, thanks for your response. but part of the problem persists..

our service/host check latencies definetly decreased from >500 seconds to 150 seconds (for service check latency) and 60 seconds (for hosts)

but, i have seen that when I apply config, that our last check times stop for a while (approx 30 minutes ). active service checks (1-min,5 and 15-min) drops to 0 while active host checks shows (i think) correct numbers. this was the case before applying your changes..

when I see the following log entries, GUI becomes normal (last check times update) (may be just a coincidance)

[1611215866] NDO-3: Ended downtime thread
[1611215866] NDO-3: Ended acknowledgement thread
[1611215866] NDO-3: Ended comment thread
[1611215866] NDO-3: Ended flapping thread
[1611215866] NDO-3: Ended statechange thread
[1611215866] NDO-3: Ended event_handler thread
[1611215867] NDO-3: Ended notification thread
[1611215929] NDO-3: Ended service_check thread
[1611215929] NDO-3: Ended timed_event thread

anyway if you want we can continue with the ticket
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: HIGH Host and service check latency

Post by ssax »

Please create a ticket for this and include a link back to this forum thread so we can get a remote session setup:

https://support.nagios.com/tickets/
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: HIGH Host and service check latency

Post by ssax »

Locking thread, ticket received, we will continue support through the ticket.

Thank you!
Locked