HIGH Host and service check latency
-
erkanerturk
- Posts: 53
- Joined: Wed Jan 16, 2019 4:35 am
HIGH Host and service check latency
Hi
i have a server.
i have almost 30K checks in 15 minute interval. it is a VM. most of the checks are SNMP checks..
i see that host check execution time avg is 0.26 sec
but host check latency is avg 165 sec
similarly
service check exec time: 0.81 sec (avg) BUT
service check latency : 160 sec (avg)
i also noticed that, from time to time, last check times are not updated. i suspect that, this is because of this latency issue
how can i correct the problem
TIA
i have a server.
i have almost 30K checks in 15 minute interval. it is a VM. most of the checks are SNMP checks..
i see that host check execution time avg is 0.26 sec
but host check latency is avg 165 sec
similarly
service check exec time: 0.81 sec (avg) BUT
service check latency : 160 sec (avg)
i also noticed that, from time to time, last check times are not updated. i suspect that, this is because of this latency issue
how can i correct the problem
TIA
Re: HIGH Host and service check latency
Please PM me a copy of your profile.zip, you can download it from Admin > System Profile by clicking the Download Profile button.
Additionally, please send the output of these commands:
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password
This next command may fail, that's okay, not all systems run postgresql, send the output anyways:
Additionally, please send the output of these commands:
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the first command if your DB is offloaded to another server and/or you've changed the root mysql password
Code: Select all
echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --tableCode: Select all
echo "SELECT relname as Table, pg_size_pretty(pg_total_relation_size(relid)) As Size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) as ExternalSize FROM pg_catalog.pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC;" | psql nagiosxi nagiosxi-
erkanerturk
- Posts: 53
- Joined: Wed Jan 16, 2019 4:35 am
Re: HIGH Host and service check latency
Hi
i could not upload our profile.zip because of our new organizational policy. sorry for that. i could not eliminate the organizational data
i have seen no errors in the mariadb.log file.
and i see at least %30 cpu idle time..
if you want me to send you a data, please specify
query results are the following
PostgreSQL Query Result:
i could not upload our profile.zip because of our new organizational policy. sorry for that. i could not eliminate the organizational data
i have seen no errors in the mariadb.log file.
and i see at least %30 cpu idle time..
if you want me to send you a data, please specify
query results are the following
PostgreSQL Query Result:
Code: Select all
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
MYSQL Query result:
+--------------------------------------------+------------+
| Table | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements | 0.00 |
| nagios_commands | 0.02 |
| nagios_commenthistory | 23.40 |
| nagios_comments | 0.00 |
| nagios_configfiles | 0.01 |
| nagios_configfilevariables | 0.01 |
| nagios_conninfo | 0.27 |
| nagios_contact_addresses | 0.00 |
| nagios_contact_notificationcommands | 0.01 |
| nagios_contactgroup_members | 0.01 |
| nagios_contactgroups | 0.00 |
| nagios_contactnotificationmethods | 478.19 |
| nagios_contactnotifications | 505.58 |
| nagios_contacts | 0.01 |
| nagios_contactstatus | 0.01 |
| nagios_customvariables | 1.36 |
| nagios_customvariablestatus | 1.32 |
| nagios_dbversion | 0.00 |
| nagios_downtimehistory | 0.00 |
| nagios_eventhandlers | 0.20 |
| nagios_externalcommands | 0.01 |
| nagios_flappinghistory | 9.75 |
| nagios_host_contactgroups | 0.08 |
| nagios_host_contacts | 0.10 |
| nagios_host_parenthosts | 0.00 |
| nagios_hostchecks | 0.56 |
| nagios_hostdependencies | 0.00 |
| nagios_hostescalation_contactgroups | 0.00 |
| nagios_hostescalation_contacts | 0.00 |
| nagios_hostescalations | 0.00 |
| nagios_hostgroup_members | 0.08 |
| nagios_hostgroups | 0.00 |
| nagios_hosts | 0.46 |
| nagios_hoststatus | 0.97 |
| nagios_instances | 0.00 |
| nagios_logentries | 2428.10 |
| nagios_notifications | 426.48 |
| nagios_objects | 5.96 |
| nagios_processevents | 0.22 |
| nagios_programstatus | 0.00 |
| nagios_runtimevariables | 0.00 |
| nagios_scheduleddowntime | 0.00 |
| nagios_service_contactgroups | 1.21 |
| nagios_service_contacts | 0.41 |
| nagios_service_parentservices | 0.00 |
| nagios_servicechecks | 6.56 |
| nagios_servicedependencies | 0.00 |
| nagios_serviceescalation_contactgroups | 0.00 |
| nagios_serviceescalation_contacts | 0.00 |
| nagios_serviceescalations | 0.00 |
| nagios_servicegroup_members | 0.00 |
| nagios_servicegroups | 0.00 |
| nagios_services | 7.61 |
| nagios_servicestatus | 15.91 |
| nagios_statehistory | 462.82 |
| nagios_systemcommands | 0.03 |
| nagios_timedeventqueue | 0.00 |
| nagios_timedevents | 0.00 |
| nagios_timeperiod_timeranges | 0.03 |
| nagios_timeperiods | 0.01 |
| tbl_command | 0.06 |
| tbl_contact | 0.03 |
| tbl_contactgroup | 0.03 |
| tbl_contacttemplate | 0.03 |
| tbl_domain | 0.03 |
| tbl_host | 0.50 |
| tbl_hostdependency | 0.03 |
| tbl_hostescalation | 0.03 |
| tbl_hostextinfo | 0.03 |
| tbl_hostgroup | 0.03 |
| tbl_hosttemplate | 0.03 |
| tbl_info | 0.17 |
| tbl_lnkContactToCommandHost | 0.02 |
| tbl_lnkContactToCommandService | 0.02 |
| tbl_lnkContactToContactgroup | 0.02 |
| tbl_lnkContactToContacttemplate | 0.02 |
| tbl_lnkContactToVariabledefinition | 0.02 |
| tbl_lnkContactgroupToContact | 0.02 |
| tbl_lnkContactgroupToContactgroup | 0.02 |
| tbl_lnkContacttemplateToCommandHost | 0.02 |
| tbl_lnkContacttemplateToCommandService | 0.02 |
| tbl_lnkContacttemplateToContactgroup | 0.02 |
| tbl_lnkContacttemplateToContacttemplate | 0.02 |
| tbl_lnkContacttemplateToVariabledefinition | 0.02 |
| tbl_lnkHostToContact | 0.09 |
| tbl_lnkHostToContactgroup | 0.08 |
| tbl_lnkHostToHost | 0.02 |
| tbl_lnkHostToHostgroup | 0.02 |
| tbl_lnkHostToHosttemplate | 0.09 |
| tbl_lnkHostToVariabledefinition | 0.08 |
| tbl_lnkHostdependencyToHost_DH | 0.02 |
| tbl_lnkHostdependencyToHost_H | 0.02 |
| tbl_lnkHostdependencyToHostgroup_DH | 0.02 |
| tbl_lnkHostdependencyToHostgroup_H | 0.02 |
| tbl_lnkHostescalationToContact | 0.02 |
| tbl_lnkHostescalationToContactgroup | 0.02 |
| tbl_lnkHostescalationToHost | 0.02 |
| tbl_lnkHostescalationToHostgroup | 0.02 |
| tbl_lnkHostgroupToHost | 0.06 |
| tbl_lnkHostgroupToHostgroup | 0.02 |
| tbl_lnkHosttemplateToContact | 0.02 |
| tbl_lnkHosttemplateToContactgroup | 0.02 |
| tbl_lnkHosttemplateToHost | 0.02 |
| tbl_lnkHosttemplateToHostgroup | 0.02 |
| tbl_lnkHosttemplateToHosttemplate | 0.02 |
| tbl_lnkHosttemplateToVariabledefinition | 0.02 |
| tbl_lnkServiceToContact | 0.14 |
| tbl_lnkServiceToContactgroup | 0.31 |
| tbl_lnkServiceToHost | 1.52 |
| tbl_lnkServiceToHostgroup | 0.02 |
| tbl_lnkServiceToServicegroup | 0.02 |
| tbl_lnkServiceToServicetemplate | 1.48 |
| tbl_lnkServiceToVariabledefinition | 0.42 |
| tbl_lnkServicedependencyToHost_DH | 0.02 |
| tbl_lnkServicedependencyToHost_H | 0.02 |
| tbl_lnkServicedependencyToHostgroup_DH | 0.02 |
| tbl_lnkServicedependencyToHostgroup_H | 0.02 |
| tbl_lnkServicedependencyToService_DS | 0.02 |
| tbl_lnkServicedependencyToService_S | 0.02 |
| tbl_lnkServicedependencyToServicegroup_DS | 0.02 |
| tbl_lnkServicedependencyToServicegroup_S | 0.02 |
| tbl_lnkServiceescalationToContact | 0.02 |
| tbl_lnkServiceescalationToContactgroup | 0.02 |
| tbl_lnkServiceescalationToHost | 0.02 |
| tbl_lnkServiceescalationToHostgroup | 0.02 |
| tbl_lnkServiceescalationToService | 0.02 |
| tbl_lnkServiceescalationToServicegroup | 0.02 |
| tbl_lnkServicegroupToService | 0.02 |
| tbl_lnkServicegroupToServicegroup | 0.02 |
| tbl_lnkServicetemplateToContact | 0.02 |
| tbl_lnkServicetemplateToContactgroup | 0.02 |
| tbl_lnkServicetemplateToHost | 0.02 |
| tbl_lnkServicetemplateToHostgroup | 0.02 |
| tbl_lnkServicetemplateToServicegroup | 0.02 |
| tbl_lnkServicetemplateToServicetemplate | 0.02 |
| tbl_lnkServicetemplateToVariabledefinition | 0.02 |
| tbl_lnkTimeperiodToTimeperiod | 0.02 |
| tbl_logbook | 0.27 |
| tbl_mainmenu | 0.02 |
| tbl_permission | 0.02 |
| tbl_permission_inactive | 0.02 |
| tbl_service | 7.52 |
| tbl_servicedependency | 0.03 |
| tbl_serviceescalation | 0.03 |
| tbl_serviceextinfo | 0.03 |
| tbl_servicegroup | 0.03 |
| tbl_servicetemplate | 0.03 |
| tbl_session | 0.02 |
| tbl_session_locks | 0.02 |
| tbl_settings | 0.03 |
| tbl_submenu | 0.02 |
| tbl_timedefinition | 0.06 |
| tbl_timeperiod | 0.03 |
| tbl_user | 0.03 |
| tbl_variabledefinition | 1.52 |
| xi_auditlog | 3198.27 |
| xi_auth_tokens | 35.70 |
| xi_cmp_ccm_backups | 0.02 |
| xi_cmp_favorites | 0.03 |
| xi_cmp_nagiosbpi_backups | 0.06 |
| xi_cmp_trapdata | 0.16 |
| xi_cmp_trapdata_log | 0.03 |
| xi_commands | 0.05 |
| xi_deploy_agents | 0.02 |
| xi_deploy_jobs | 0.02 |
| xi_eventqueue | 0.03 |
| xi_events | 3.31 |
| xi_incidents | 0.00 |
| xi_meta | 41.06 |
| xi_mibs | 0.05 |
| xi_options | 0.03 |
| xi_sessions | 0.03 |
| xi_sysstat | 0.03 |
| xi_usermeta | 3.84 |
| xi_users | 0.08 |
+--------------------------------------------+------------+-
erkanerturk
- Posts: 53
- Joined: Wed Jan 16, 2019 4:35 am
Re: HIGH Host and service check latency
Hi I have noticed something
monitoring engine status > monitoring engine check statistics counters decreased and then showed 0. at the same time, i have noticed that last check time stopped at 10.44 and stayed 30 minutes with that value.
from linux cli, i see that nagios continues to do checks but in the gui last check times stayed the same..
should i increase npcd load threshold?
please advice..
monitoring engine status > monitoring engine check statistics counters decreased and then showed 0. at the same time, i have noticed that last check time stopped at 10.44 and stayed 30 minutes with that value.
from linux cli, i see that nagios continues to do checks but in the gui last check times stayed the same..
should i increase npcd load threshold?
please advice..
Re: HIGH Host and service check latency
Please send me your /usr/local/nagios/etc/nagios.cfg.
What XI version are you running? You can find it on the bottom left hand side of the web interface.
Are you seeing any errors in /var/log/messages, /var/log/http/error_log, /var/log/httpd/ssl_error_log, or /var/log/dmesg?
Include the output of these commands as root:
Attach your /etc/php.ini file as well.
What XI version are you running? You can find it on the bottom left hand side of the web interface.
Are you seeing any errors in /var/log/messages, /var/log/http/error_log, /var/log/httpd/ssl_error_log, or /var/log/dmesg?
Include the output of these commands as root:
Code: Select all
sar
ps aux
ulimit -a
su -s /bin/bash -c 'ulimit -a' nagios
su -s /bin/bash -c 'ulimit -a' mysql
-
erkanerturk
- Posts: 53
- Joined: Wed Jan 16, 2019 4:35 am
Re: HIGH Host and service check latency
Hi
I have sent the files via PM
waiting for your reply..
I have sent the files via PM
waiting for your reply..
Re: HIGH Host and service check latency
Please edit your and change these:
To these:
Then restart apache:
Then take the attached zip file, transfer it to your XI server, and run these commands as root against it:
- Or you can upgrade to XI 5.8.1 and it will update your NDO3 as well
If you have an offloaded database or changed the default MySQL passwords you will need to edit your /usr/local/nagios/etc/ndo.cfg file and update these before running the next command to start it up:
- You can get the info from your from /usr/local/nagiosxi/html/config.inc.php for the ndoutils database
Then restart the nagios service:
Then apply configuration and see if that alleviates the issue.
If it doesn't, please create a ticket for this and include a link back to this forum thread so we can get a remote session setup:
https://support.nagios.com/tickets/
Code: Select all
/etc/php.iniCode: Select all
max_execution_time = 60
max_input_vars = 5000
memory_limit = 256MCode: Select all
max_execution_time = 600
max_input_vars = 50000
memory_limit = 1024MCode: Select all
systemctl restart httpd- Or you can upgrade to XI 5.8.1 and it will update your NDO3 as well
Code: Select all
unzip ndo-master.zip
cd ndo-master
./configure
make all
make install
- You can get the info from your from /usr/local/nagiosxi/html/config.inc.php for the ndoutils database
Code: Select all
db_host
db_port
db_user
db_pass
Code: Select all
systemctl restart nagiosIf it doesn't, please create a ticket for this and include a link back to this forum thread so we can get a remote session setup:
https://support.nagios.com/tickets/
You do not have the required permissions to view the files attached to this post.
-
erkanerturk
- Posts: 53
- Joined: Wed Jan 16, 2019 4:35 am
Re: HIGH Host and service check latency
Hi
firstly, thanks for your response. but part of the problem persists..
our service/host check latencies definetly decreased from >500 seconds to 150 seconds (for service check latency) and 60 seconds (for hosts)
but, i have seen that when I apply config, that our last check times stop for a while (approx 30 minutes ). active service checks (1-min,5 and 15-min) drops to 0 while active host checks shows (i think) correct numbers. this was the case before applying your changes..
when I see the following log entries, GUI becomes normal (last check times update) (may be just a coincidance)
[1611215866] NDO-3: Ended downtime thread
[1611215866] NDO-3: Ended acknowledgement thread
[1611215866] NDO-3: Ended comment thread
[1611215866] NDO-3: Ended flapping thread
[1611215866] NDO-3: Ended statechange thread
[1611215866] NDO-3: Ended event_handler thread
[1611215867] NDO-3: Ended notification thread
[1611215929] NDO-3: Ended service_check thread
[1611215929] NDO-3: Ended timed_event thread
anyway if you want we can continue with the ticket
firstly, thanks for your response. but part of the problem persists..
our service/host check latencies definetly decreased from >500 seconds to 150 seconds (for service check latency) and 60 seconds (for hosts)
but, i have seen that when I apply config, that our last check times stop for a while (approx 30 minutes ). active service checks (1-min,5 and 15-min) drops to 0 while active host checks shows (i think) correct numbers. this was the case before applying your changes..
when I see the following log entries, GUI becomes normal (last check times update) (may be just a coincidance)
[1611215866] NDO-3: Ended downtime thread
[1611215866] NDO-3: Ended acknowledgement thread
[1611215866] NDO-3: Ended comment thread
[1611215866] NDO-3: Ended flapping thread
[1611215866] NDO-3: Ended statechange thread
[1611215866] NDO-3: Ended event_handler thread
[1611215867] NDO-3: Ended notification thread
[1611215929] NDO-3: Ended service_check thread
[1611215929] NDO-3: Ended timed_event thread
anyway if you want we can continue with the ticket
Re: HIGH Host and service check latency
Please create a ticket for this and include a link back to this forum thread so we can get a remote session setup:
https://support.nagios.com/tickets/
https://support.nagios.com/tickets/
Re: HIGH Host and service check latency
Locking thread, ticket received, we will continue support through the ticket.
Thank you!
Thank you!