Page 1 of 1
nagios XI Memory usage issue
Posted: Tue Oct 16, 2018 5:02 am
by salami
I have nagios XI with more than 12K host. I integrate nagios XI with mod-gearman for distribution. Last Check field in Host Status detail has not been updated in timely manner (check interval is 1 min and host availability check is done by a perl script using ping with sending 60 packets in each minute). when I checked /var/log/messages logs, I faced with these errors:
Code: Select all
ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may need to be tuned. See README.
ndo2db: Warning: queue send error, retrying...
I find below link in Nagios Support Knowledgebase:
https://support.nagios.com/kb/article/n ... d-139.html
and increase kernel.msgmnb, kernel.msgmax and kernel.msgmni parameters as below:
Code: Select all
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 4194304000
kernel.msgmax = 4194304000
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
kernel.msgmni = 4096000
now the error has been disappeared but message queue of nagios has been increased continuously till used all allocated memory. when I clear message queue using following command, memory has been flushed
Code: Select all
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
but "Last Check" field in Host Status detail has not been updated till I restart ndo2db service.
hardware resource of main Nagios XI server is as below:
10 Core CPU
10 GB RAM
Database has been offloaded to remote server and ramdisk installed.
would you please help me to find out the root cause and resolve the issue?
Thanks
Re: nagios XI Memory usage issue
Posted: Tue Oct 16, 2018 3:39 pm
by cdienger
What version of XI is this? Have there been any recent changes or problems? This can occur due to db problems and you can run the repair script to help repair it:
https://assets.nagios.com/downloads/nag ... tabase.pdf
Re: nagios XI Memory usage issue
Posted: Wed Oct 17, 2018 6:20 am
by salami
I install version 5.4.13 then mod-gearman installed and then I upgrade it to 5.5.5 version. So the installation is latest version but core is old version (4.2.4 I think).
there are no changes or problems recently.
I offloading DB to remote server and the resource usage on DB server is too low. I run repair database script based your suggestion and output is as follow:
Code: Select all
DATABASE: nagios
TABLE:
note : The storage engine for the table doesn't support repair
nagios.nagios_acknowledgements OK
nagios.nagios_commands OK
nagios.nagios_commenthistory OK
nagios.nagios_comments OK
nagios.nagios_configfiles OK
nagios.nagios_configfilevariables OK
nagios.nagios_conninfo OK
nagios.nagios_contact_addresses OK
nagios.nagios_contact_notificationcommands OK
nagios.nagios_contactgroup_members OK
nagios.nagios_contactgroups OK
nagios.nagios_contactnotificationmethods OK
nagios.nagios_contactnotifications OK
nagios.nagios_contacts OK
nagios.nagios_contactstatus OK
nagios.nagios_customvariables OK
nagios.nagios_customvariablestatus OK
nagios.nagios_dbversion OK
nagios.nagios_downtimehistory OK
nagios.nagios_eventhandlers OK
nagios.nagios_externalcommands OK
nagios.nagios_flappinghistory OK
nagios.nagios_host_contactgroups OK
nagios.nagios_host_contacts OK
nagios.nagios_host_parenthosts OK
nagios.nagios_hostchecks OK
nagios.nagios_hostdependencies OK
nagios.nagios_hostescalation_contactgroups OK
nagios.nagios_hostescalation_contacts OK
nagios.nagios_hostescalations OK
nagios.nagios_hostgroup_members OK
nagios.nagios_hostgroups OK
nagios.nagios_hosts OK
nagios.nagios_hoststatus OK
nagios.nagios_instances OK
nagios.nagios_logentries OK
nagios.nagios_notifications OK
nagios.nagios_objects OK
nagios.nagios_processevents OK
nagios.nagios_programstatus OK
nagios.nagios_runtimevariables OK
nagios.nagios_scheduleddowntime OK
nagios.nagios_service_contactgroups OK
nagios.nagios_service_contacts OK
nagios.nagios_service_parentservices OK
nagios.nagios_servicechecks OK
nagios.nagios_servicedependencies OK
nagios.nagios_serviceescalation_contactgroups OK
nagios.nagios_serviceescalation_contacts OK
nagios.nagios_serviceescalations OK
nagios.nagios_servicegroup_members OK
nagios.nagios_servicegroups OK
nagios.nagios_services OK
nagios.nagios_servicestatus OK
nagios.nagios_statehistory OK
nagios.nagios_systemcommands OK
nagios.nagios_timedeventqueue OK
nagios.nagios_timedevents OK
nagios.nagios_timeperiod_timeranges OK
nagios.nagios_timeperiods OK
Issued remote command 'mysqlcheck -f -r -u nagios -pnagios -h [DB IP address] --port=3306 --databases nagios'
DATABASE: nagiosql
TABLE:
nagiosql.tbl_command OK
nagiosql.tbl_contact OK
nagiosql.tbl_contactgroup OK
nagiosql.tbl_contacttemplate OK
nagiosql.tbl_domain OK
nagiosql.tbl_host OK
nagiosql.tbl_hostdependency OK
nagiosql.tbl_hostescalation OK
nagiosql.tbl_hostextinfo OK
nagiosql.tbl_hostgroup OK
nagiosql.tbl_hosttemplate OK
nagiosql.tbl_info OK
nagiosql.tbl_lnkContactToCommandHost OK
nagiosql.tbl_lnkContactToCommandService OK
nagiosql.tbl_lnkContactToContactgroup OK
nagiosql.tbl_lnkContactToContacttemplate OK
nagiosql.tbl_lnkContactToVariabledefinition OK
nagiosql.tbl_lnkContactgroupToContact OK
nagiosql.tbl_lnkContactgroupToContactgroup OK
nagiosql.tbl_lnkContacttemplateToCommandHost OK
nagiosql.tbl_lnkContacttemplateToCommandService OK
nagiosql.tbl_lnkContacttemplateToContactgroup OK
nagiosql.tbl_lnkContacttemplateToContacttemplate OK
nagiosql.tbl_lnkContacttemplateToVariabledefinition OK
nagiosql.tbl_lnkHostToContact OK
nagiosql.tbl_lnkHostToContactgroup OK
nagiosql.tbl_lnkHostToHost OK
nagiosql.tbl_lnkHostToHostgroup OK
nagiosql.tbl_lnkHostToHosttemplate OK
nagiosql.tbl_lnkHostToVariabledefinition OK
nagiosql.tbl_lnkHostdependencyToHost_DH OK
nagiosql.tbl_lnkHostdependencyToHost_H OK
nagiosql.tbl_lnkHostdependencyToHostgroup_DH OK
nagiosql.tbl_lnkHostdependencyToHostgroup_H OK
nagiosql.tbl_lnkHostescalationToContact OK
nagiosql.tbl_lnkHostescalationToContactgroup OK
nagiosql.tbl_lnkHostescalationToHost OK
nagiosql.tbl_lnkHostescalationToHostgroup OK
nagiosql.tbl_lnkHostgroupToHost OK
nagiosql.tbl_lnkHostgroupToHostgroup OK
nagiosql.tbl_lnkHosttemplateToContact OK
nagiosql.tbl_lnkHosttemplateToContactgroup OK
nagiosql.tbl_lnkHosttemplateToHost OK
nagiosql.tbl_lnkHosttemplateToHostgroup OK
nagiosql.tbl_lnkHosttemplateToHosttemplate OK
nagiosql.tbl_lnkHosttemplateToVariabledefinition OK
nagiosql.tbl_lnkServiceToContact OK
nagiosql.tbl_lnkServiceToContactgroup OK
nagiosql.tbl_lnkServiceToHost OK
nagiosql.tbl_lnkServiceToHostgroup OK
nagiosql.tbl_lnkServiceToServicegroup OK
nagiosql.tbl_lnkServiceToServicetemplate OK
nagiosql.tbl_lnkServiceToVariabledefinition OK
nagiosql.tbl_lnkServicedependencyToHost_DH OK
nagiosql.tbl_lnkServicedependencyToHost_H OK
nagiosql.tbl_lnkServicedependencyToHostgroup_DH OK
nagiosql.tbl_lnkServicedependencyToHostgroup_H OK
nagiosql.tbl_lnkServicedependencyToService_DS OK
nagiosql.tbl_lnkServicedependencyToService_S OK
nagiosql.tbl_lnkServiceescalationToContact OK
nagiosql.tbl_lnkServiceescalationToContactgroup OK
nagiosql.tbl_lnkServiceescalationToHost OK
nagiosql.tbl_lnkServiceescalationToHostgroup OK
nagiosql.tbl_lnkServiceescalationToService OK
nagiosql.tbl_lnkServicegroupToService OK
nagiosql.tbl_lnkServicegroupToServicegroup OK
nagiosql.tbl_lnkServicetemplateToContact OK
nagiosql.tbl_lnkServicetemplateToContactgroup OK
nagiosql.tbl_lnkServicetemplateToHost OK
nagiosql.tbl_lnkServicetemplateToHostgroup OK
nagiosql.tbl_lnkServicetemplateToServicegroup OK
nagiosql.tbl_lnkServicetemplateToServicetemplate OK
nagiosql.tbl_lnkServicetemplateToVariabledefinition OK
nagiosql.tbl_lnkTimeperiodToTimeperiod OK
nagiosql.tbl_logbook OK
nagiosql.tbl_mainmenu OK
nagiosql.tbl_permission
note : The storage engine for the table doesn't support repair
nagiosql.tbl_permission_inactive
note : The storage engine for the table doesn't support repair
nagiosql.tbl_service OK
nagiosql.tbl_servicedependency OK
nagiosql.tbl_serviceescalation OK
nagiosql.tbl_serviceextinfo OK
nagiosql.tbl_servicegroup OK
nagiosql.tbl_servicetemplate OK
nagiosql.tbl_session OK
nagiosql.tbl_session_locks OK
nagiosql.tbl_settings OK
nagiosql.tbl_submenu OK
nagiosql.tbl_timedefinition OK
nagiosql.tbl_timeperiod OK
nagiosql.tbl_user OK
nagiosql.tbl_variabledefinition OK
Issued remote command 'mysqlcheck -f -r -u nagiosql -pnagiosql -h [DB IP address] --port=3306 --databases nagiosql'
DATABASE: nagiosxi
TABLE:
nagiosxi.xi_auditlog OK
nagiosxi.xi_auth_tokens
note : The storage engine for the table doesn't support repair
nagiosxi.xi_cmp_trapdata
note : The storage engine for the table doesn't support repair
nagiosxi.xi_cmp_trapdata_log
note : The storage engine for the table doesn't support repair
nagiosxi.xi_commands OK
nagiosxi.xi_eventqueue OK
nagiosxi.xi_events OK
nagiosxi.xi_incidents OK
nagiosxi.xi_meta OK
nagiosxi.xi_options OK
nagiosxi.xi_sessions
note : The storage engine for the table doesn't support repair
nagiosxi.xi_sysstat OK
nagiosxi.xi_usermeta OK
nagiosxi.xi_users OK
Issued remote command 'mysqlcheck -f -r -u nagiosxi -pnagiosxi -h [DB IP address] --port=3306 --databases nagiosxi'
Stopping ndo2db: done.
Starting ndo2db: done.
Running configuration check...
Stopping nagios:. done.
Starting nagios: done.
=======================
nagios offloaded database repair succeeded
nagiosql offloaded database repair succeeded
nagiosxi offloaded database repair succeeded
Seems there are no issues on DB side because nothing changed after running repair script and in the output nothing reported as issue.
now I create a script for flushing memory and restarting ndo2db service after that as below and cron it each 5 min.
Code: Select all
#!/bin/bash
for i in `ipcs -q | grep nagios |awk '{print $2}'`
do
ipcrm -q $i
done
/etc/init.d/ndo2db restart
This temporary solution resolve the problem but make other issue. Applying configuration take too long to apply and take too long time to show my Host and service status summary after applying configuration (reminder for more than 12K host I mentioned in first post of this topic). Also, Last Check field has been updated each 5 min not each 1 min (I schedule my script each 5 min but check Interval is 1 min).
Is there any other suggestion?
Thanks
Re: nagios XI Memory usage issue
Posted: Wed Oct 17, 2018 2:37 pm
by cdienger
Disable the work around and let's collect the following:
-Verify the version of Core with "
/usr/local/nagios/bin/nagios --version"
-Edit
/usr/local/nagios/etc/ndo2db.cfg and set
debug_levelto
1. Restart the services as described below and let it run a few minutes before collecting
/usr/local/nagios/var/ndo2db.debug and reverting back to disabling the debug log
-A profile created under Admin > System Config > System Profile > Download Profile.
The data can be PM'd to me and
@Nagios Support.
Restart the services with:
Note you ***must*** use mariadb instead of mysqld in the commands below, ***if*** you have mariadb.
Code: Select all
service nagios stop
service ndo2db stop
service mysqld stop
service crond stop
service httpd stop
killall -9 nagios
killall -9 ndo2db
rm -f /usr/local/nagios/var/rw/nagios.cmd
rm -f /usr/local/nagios/var/nagios.lock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagios/var/ndo2db.lock
rm -f /usr/local/nagiosxi/var/reconfigure_nagios.lock
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
service mysqld start
service ndo2db start
service nagios start
service httpd start
service crond start
Re: nagios XI Memory usage issue
Posted: Tue Oct 23, 2018 10:01 am
by salami
Nagios Core Version:
Code: Select all
Nagios Core 4.2.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 12-07-2016
License: GPL
Website: https://www.nagios.org
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License version 2 as
published by the Free Software Foundation.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
Also, profile.zip and ndo2db.debug file has been sent to you in PM
Thanks
Re: nagios XI Memory usage issue
Posted: Tue Oct 23, 2018 4:12 pm
by cdienger
The profile shows issues connecting to the database. Try increasing the number of allowed connections per
https://support.nagios.com/kb/article/n ... s-513.html.
The pinging perl script seems a bit excessive - do yo need to ping 60 times with each check? The default ping plugins use 5 pings I believe. I would try lowering this as well as the frequency as a test(try pinging 5 times once every 5 minutes for example).
Re: nagios XI Memory usage issue
Posted: Wed Oct 24, 2018 1:24 am
by salami
Thanks for your reply.
I need to have 60 ping per each check interval (1 min) because of our customer SLA about packet loss and RTA so, I cannot change the ping parameters in perl script.
I increase max_connections based on your suggestion. but problem does not solve. you can see the my.cnf configuration I set as below:
Code: Select all
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
bind-address=[my offloaded DB IP Address]
port=3306
skip_name_resolve
max-allowed-packet = 16M
innodb-file-per-table = 1
innodb-buffer-pool-size = 1G
innodb_additional_mem_pool_size = 128M
innodb_log_buffer_size = 512M
sort_buffer_size = 5M
read_buffer_size = 2M
read_rnd_buffer_size = 1M
join_buffer_size = 1M
thread_stack = 1M
binlog_cache_size = 1M
tmp-table-size = 64M
max-heap-table-size = 32M
query-cache-size = 512M
query_cache_limit = 1M
max-connections = 818
thread-cache-size = 512
open-files-limit = 65535
table-definition-cache = 4096
table-open-cache = 4096
ignore-db-dir = lost+found
[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
#
# include all files from the config directory
#
!includedir /etc/my.cnf.d
I check ndo2db debug log for SQL queries and found that too many queries has executed because of Free Variables I'm defining for each host. I define 18 Free Variables for each host so, for 11K hosts, the system need to execute 11000*18=198000 queries just for host configuration items. seems nagios cannot execute all the queries in the timely manner and in some cases, executing queries stopped and some host or services will not be shown in web UI. Is it possible to expand the execution time of running queries by ndo2db or exclude Free Variables from executing when we need to apply configuration? or any other solution such as moving free variables data to a custom component? I need these data to be bind to Hosts because of some integration matters with other 3rd party software's.
Re: nagios XI Memory usage issue
Posted: Wed Oct 24, 2018 12:53 pm
by cdienger
Use
https://support.nagios.com/kb/article/n ... e-611.html to increase some of the timeout and memory settings used by XI. I'd start with the values suggested in the KB but you may need to increase them still.