nagios XI Memory usage issue

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
salami
Posts: 30
Joined: Tue Jun 26, 2018 4:36 am

nagios XI Memory usage issue

Post by salami »

I have nagios XI with more than 12K host. I integrate nagios XI with mod-gearman for distribution. Last Check field in Host Status detail has not been updated in timely manner (check interval is 1 min and host availability check is done by a perl script using ping with sending 60 packets in each minute). when I checked /var/log/messages logs, I faced with these errors:

Code: Select all

ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may need to be tuned. See README.
ndo2db: Warning: queue send error, retrying... 
I find below link in Nagios Support Knowledgebase:
https://support.nagios.com/kb/article/n ... d-139.html

and increase kernel.msgmnb, kernel.msgmax and kernel.msgmni parameters as below:

Code: Select all

net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 4194304000
kernel.msgmax = 4194304000
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
kernel.msgmni = 4096000
now the error has been disappeared but message queue of nagios has been increased continuously till used all allocated memory. when I clear message queue using following command, memory has been flushed

Code: Select all

for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done

but "Last Check" field in Host Status detail has not been updated till I restart ndo2db service.

hardware resource of main Nagios XI server is as below:
10 Core CPU
10 GB RAM

Database has been offloaded to remote server and ramdisk installed.

would you please help me to find out the root cause and resolve the issue?

Thanks
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: nagios XI Memory usage issue

Post by cdienger »

What version of XI is this? Have there been any recent changes or problems? This can occur due to db problems and you can run the repair script to help repair it:

https://assets.nagios.com/downloads/nag ... tabase.pdf
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
salami
Posts: 30
Joined: Tue Jun 26, 2018 4:36 am

Re: nagios XI Memory usage issue

Post by salami »

I install version 5.4.13 then mod-gearman installed and then I upgrade it to 5.5.5 version. So the installation is latest version but core is old version (4.2.4 I think).

there are no changes or problems recently.

I offloading DB to remote server and the resource usage on DB server is too low. I run repair database script based your suggestion and output is as follow:

Code: Select all

DATABASE: nagios
TABLE:    
note     : The storage engine for the table doesn't support repair
nagios.nagios_acknowledgements                     OK
nagios.nagios_commands                             OK
nagios.nagios_commenthistory                       OK
nagios.nagios_comments                             OK
nagios.nagios_configfiles                          OK
nagios.nagios_configfilevariables                  OK
nagios.nagios_conninfo                             OK
nagios.nagios_contact_addresses                    OK
nagios.nagios_contact_notificationcommands         OK
nagios.nagios_contactgroup_members                 OK
nagios.nagios_contactgroups                        OK
nagios.nagios_contactnotificationmethods           OK
nagios.nagios_contactnotifications                 OK
nagios.nagios_contacts                             OK
nagios.nagios_contactstatus                        OK
nagios.nagios_customvariables                      OK
nagios.nagios_customvariablestatus                 OK
nagios.nagios_dbversion                            OK
nagios.nagios_downtimehistory                      OK
nagios.nagios_eventhandlers                        OK
nagios.nagios_externalcommands                     OK
nagios.nagios_flappinghistory                      OK
nagios.nagios_host_contactgroups                   OK
nagios.nagios_host_contacts                        OK
nagios.nagios_host_parenthosts                     OK
nagios.nagios_hostchecks                           OK
nagios.nagios_hostdependencies                     OK
nagios.nagios_hostescalation_contactgroups         OK
nagios.nagios_hostescalation_contacts              OK
nagios.nagios_hostescalations                      OK
nagios.nagios_hostgroup_members                    OK
nagios.nagios_hostgroups                           OK
nagios.nagios_hosts                                OK
nagios.nagios_hoststatus                           OK
nagios.nagios_instances                            OK
nagios.nagios_logentries                           OK
nagios.nagios_notifications                        OK
nagios.nagios_objects                              OK
nagios.nagios_processevents                        OK
nagios.nagios_programstatus                        OK
nagios.nagios_runtimevariables                     OK
nagios.nagios_scheduleddowntime                    OK
nagios.nagios_service_contactgroups                OK
nagios.nagios_service_contacts                     OK
nagios.nagios_service_parentservices               OK
nagios.nagios_servicechecks                        OK
nagios.nagios_servicedependencies                  OK
nagios.nagios_serviceescalation_contactgroups      OK
nagios.nagios_serviceescalation_contacts           OK
nagios.nagios_serviceescalations                   OK
nagios.nagios_servicegroup_members                 OK
nagios.nagios_servicegroups                        OK
nagios.nagios_services                             OK
nagios.nagios_servicestatus                        OK
nagios.nagios_statehistory                         OK
nagios.nagios_systemcommands                       OK
nagios.nagios_timedeventqueue                      OK
nagios.nagios_timedevents                          OK
nagios.nagios_timeperiod_timeranges                OK
nagios.nagios_timeperiods                          OK
Issued remote command 'mysqlcheck -f -r -u nagios -pnagios -h [DB IP address] --port=3306 --databases nagios'
DATABASE: nagiosql
TABLE:    
nagiosql.tbl_command                               OK
nagiosql.tbl_contact                               OK
nagiosql.tbl_contactgroup                          OK
nagiosql.tbl_contacttemplate                       OK
nagiosql.tbl_domain                                OK
nagiosql.tbl_host                                  OK
nagiosql.tbl_hostdependency                        OK
nagiosql.tbl_hostescalation                        OK
nagiosql.tbl_hostextinfo                           OK
nagiosql.tbl_hostgroup                             OK
nagiosql.tbl_hosttemplate                          OK
nagiosql.tbl_info                                  OK
nagiosql.tbl_lnkContactToCommandHost               OK
nagiosql.tbl_lnkContactToCommandService            OK
nagiosql.tbl_lnkContactToContactgroup              OK
nagiosql.tbl_lnkContactToContacttemplate           OK
nagiosql.tbl_lnkContactToVariabledefinition        OK
nagiosql.tbl_lnkContactgroupToContact              OK
nagiosql.tbl_lnkContactgroupToContactgroup         OK
nagiosql.tbl_lnkContacttemplateToCommandHost       OK
nagiosql.tbl_lnkContacttemplateToCommandService    OK
nagiosql.tbl_lnkContacttemplateToContactgroup      OK
nagiosql.tbl_lnkContacttemplateToContacttemplate   OK
nagiosql.tbl_lnkContacttemplateToVariabledefinition OK
nagiosql.tbl_lnkHostToContact                      OK
nagiosql.tbl_lnkHostToContactgroup                 OK
nagiosql.tbl_lnkHostToHost                         OK
nagiosql.tbl_lnkHostToHostgroup                    OK
nagiosql.tbl_lnkHostToHosttemplate                 OK
nagiosql.tbl_lnkHostToVariabledefinition           OK
nagiosql.tbl_lnkHostdependencyToHost_DH            OK
nagiosql.tbl_lnkHostdependencyToHost_H             OK
nagiosql.tbl_lnkHostdependencyToHostgroup_DH       OK
nagiosql.tbl_lnkHostdependencyToHostgroup_H        OK
nagiosql.tbl_lnkHostescalationToContact            OK
nagiosql.tbl_lnkHostescalationToContactgroup       OK
nagiosql.tbl_lnkHostescalationToHost               OK
nagiosql.tbl_lnkHostescalationToHostgroup          OK
nagiosql.tbl_lnkHostgroupToHost                    OK
nagiosql.tbl_lnkHostgroupToHostgroup               OK
nagiosql.tbl_lnkHosttemplateToContact              OK
nagiosql.tbl_lnkHosttemplateToContactgroup         OK
nagiosql.tbl_lnkHosttemplateToHost                 OK
nagiosql.tbl_lnkHosttemplateToHostgroup            OK
nagiosql.tbl_lnkHosttemplateToHosttemplate         OK
nagiosql.tbl_lnkHosttemplateToVariabledefinition   OK
nagiosql.tbl_lnkServiceToContact                   OK
nagiosql.tbl_lnkServiceToContactgroup              OK
nagiosql.tbl_lnkServiceToHost                      OK
nagiosql.tbl_lnkServiceToHostgroup                 OK
nagiosql.tbl_lnkServiceToServicegroup              OK
nagiosql.tbl_lnkServiceToServicetemplate           OK
nagiosql.tbl_lnkServiceToVariabledefinition        OK
nagiosql.tbl_lnkServicedependencyToHost_DH         OK
nagiosql.tbl_lnkServicedependencyToHost_H          OK
nagiosql.tbl_lnkServicedependencyToHostgroup_DH    OK
nagiosql.tbl_lnkServicedependencyToHostgroup_H     OK
nagiosql.tbl_lnkServicedependencyToService_DS      OK
nagiosql.tbl_lnkServicedependencyToService_S       OK
nagiosql.tbl_lnkServiceescalationToContact         OK
nagiosql.tbl_lnkServiceescalationToContactgroup    OK
nagiosql.tbl_lnkServiceescalationToHost            OK
nagiosql.tbl_lnkServiceescalationToHostgroup       OK
nagiosql.tbl_lnkServiceescalationToService         OK
nagiosql.tbl_lnkServicegroupToService              OK
nagiosql.tbl_lnkServicegroupToServicegroup         OK
nagiosql.tbl_lnkServicetemplateToContact           OK
nagiosql.tbl_lnkServicetemplateToContactgroup      OK
nagiosql.tbl_lnkServicetemplateToHost              OK
nagiosql.tbl_lnkServicetemplateToHostgroup         OK
nagiosql.tbl_lnkServicetemplateToServicegroup      OK
nagiosql.tbl_lnkServicetemplateToServicetemplate   OK
nagiosql.tbl_lnkServicetemplateToVariabledefinition OK
nagiosql.tbl_lnkTimeperiodToTimeperiod             OK
nagiosql.tbl_logbook                               OK
nagiosql.tbl_mainmenu                              OK
nagiosql.tbl_permission
note     : The storage engine for the table doesn't support repair
nagiosql.tbl_permission_inactive
note     : The storage engine for the table doesn't support repair
nagiosql.tbl_service                               OK
nagiosql.tbl_servicedependency                     OK
nagiosql.tbl_serviceescalation                     OK
nagiosql.tbl_serviceextinfo                        OK
nagiosql.tbl_servicegroup                          OK
nagiosql.tbl_servicetemplate                       OK
nagiosql.tbl_session                               OK
nagiosql.tbl_session_locks                         OK
nagiosql.tbl_settings                              OK
nagiosql.tbl_submenu                               OK
nagiosql.tbl_timedefinition                        OK
nagiosql.tbl_timeperiod                            OK
nagiosql.tbl_user                                  OK
nagiosql.tbl_variabledefinition                    OK
Issued remote command 'mysqlcheck -f -r -u nagiosql -pnagiosql -h [DB IP address] --port=3306 --databases nagiosql'
DATABASE: nagiosxi
TABLE:    
nagiosxi.xi_auditlog                               OK
nagiosxi.xi_auth_tokens
note     : The storage engine for the table doesn't support repair
nagiosxi.xi_cmp_trapdata
note     : The storage engine for the table doesn't support repair
nagiosxi.xi_cmp_trapdata_log
note     : The storage engine for the table doesn't support repair
nagiosxi.xi_commands                               OK
nagiosxi.xi_eventqueue                             OK
nagiosxi.xi_events                                 OK
nagiosxi.xi_incidents                              OK
nagiosxi.xi_meta                                   OK
nagiosxi.xi_options                                OK
nagiosxi.xi_sessions
note     : The storage engine for the table doesn't support repair
nagiosxi.xi_sysstat                                OK
nagiosxi.xi_usermeta                               OK
nagiosxi.xi_users                                  OK
Issued remote command 'mysqlcheck -f -r -u nagiosxi -pnagiosxi -h [DB IP address] --port=3306 --databases nagiosxi'
Stopping ndo2db: done.
Starting ndo2db: done.
Running configuration check...
Stopping nagios:. done.
Starting nagios: done.

=======================
nagios offloaded database repair succeeded
nagiosql offloaded database repair succeeded
nagiosxi offloaded database repair succeeded

Seems there are no issues on DB side because nothing changed after running repair script and in the output nothing reported as issue.
now I create a script for flushing memory and restarting ndo2db service after that as below and cron it each 5 min.

Code: Select all

#!/bin/bash

for i in `ipcs -q | grep nagios |awk '{print $2}'`
do
ipcrm -q $i
done

/etc/init.d/ndo2db restart
This temporary solution resolve the problem but make other issue. Applying configuration take too long to apply and take too long time to show my Host and service status summary after applying configuration (reminder for more than 12K host I mentioned in first post of this topic). Also, Last Check field has been updated each 5 min not each 1 min (I schedule my script each 5 min but check Interval is 1 min).

Is there any other suggestion?

Thanks
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: nagios XI Memory usage issue

Post by cdienger »

Disable the work around and let's collect the following:

-Verify the version of Core with "/usr/local/nagios/bin/nagios --version"

-Edit /usr/local/nagios/etc/ndo2db.cfg and set debug_levelto 1. Restart the services as described below and let it run a few minutes before collecting /usr/local/nagios/var/ndo2db.debug and reverting back to disabling the debug log

-A profile created under Admin > System Config > System Profile > Download Profile.

The data can be PM'd to me and @Nagios Support.

Restart the services with:

Note you ***must*** use mariadb instead of mysqld in the commands below, ***if*** you have mariadb.

Code: Select all

service nagios stop
service ndo2db stop
service mysqld stop
service crond stop
service httpd stop
killall -9 nagios
killall -9 ndo2db
rm -f /usr/local/nagios/var/rw/nagios.cmd
rm -f /usr/local/nagios/var/nagios.lock
rm -f /usr/local/nagios/var/ndo.sock
rm -f /usr/local/nagios/var/ndo2db.lock
rm -f /usr/local/nagiosxi/var/reconfigure_nagios.lock
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
service mysqld start
service ndo2db start
service nagios start
service httpd start
service crond start
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
salami
Posts: 30
Joined: Tue Jun 26, 2018 4:36 am

Re: nagios XI Memory usage issue

Post by salami »

Nagios Core Version:

Code: Select all

Nagios Core 4.2.4
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 12-07-2016
License: GPL

Website: https://www.nagios.org
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License version 2 as
published by the Free Software Foundation.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

Also, profile.zip and ndo2db.debug file has been sent to you in PM

Thanks
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: nagios XI Memory usage issue

Post by cdienger »

The profile shows issues connecting to the database. Try increasing the number of allowed connections per https://support.nagios.com/kb/article/n ... s-513.html.

The pinging perl script seems a bit excessive - do yo need to ping 60 times with each check? The default ping plugins use 5 pings I believe. I would try lowering this as well as the frequency as a test(try pinging 5 times once every 5 minutes for example).
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
salami
Posts: 30
Joined: Tue Jun 26, 2018 4:36 am

Re: nagios XI Memory usage issue

Post by salami »

Thanks for your reply.
I need to have 60 ping per each check interval (1 min) because of our customer SLA about packet loss and RTA so, I cannot change the ping parameters in perl script.

I increase max_connections based on your suggestion. but problem does not solve. you can see the my.cnf configuration I set as below:

Code: Select all

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
bind-address=[my offloaded DB IP Address]
port=3306
skip_name_resolve
max-allowed-packet = 16M
innodb-file-per-table          = 1
innodb-buffer-pool-size        = 1G
innodb_additional_mem_pool_size = 128M
innodb_log_buffer_size         = 512M
sort_buffer_size               = 5M
read_buffer_size               = 2M
read_rnd_buffer_size           = 1M
join_buffer_size               = 1M
thread_stack                   = 1M
binlog_cache_size              = 1M
tmp-table-size                 = 64M
max-heap-table-size            = 32M
query-cache-size               = 512M
query_cache_limit              = 1M
max-connections                = 818
thread-cache-size              = 512
open-files-limit               = 65535
table-definition-cache         = 4096
table-open-cache               = 4096
ignore-db-dir                  = lost+found



[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid

#
# include all files from the config directory
#
!includedir /etc/my.cnf.d
I check ndo2db debug log for SQL queries and found that too many queries has executed because of Free Variables I'm defining for each host. I define 18 Free Variables for each host so, for 11K hosts, the system need to execute 11000*18=198000 queries just for host configuration items. seems nagios cannot execute all the queries in the timely manner and in some cases, executing queries stopped and some host or services will not be shown in web UI. Is it possible to expand the execution time of running queries by ndo2db or exclude Free Variables from executing when we need to apply configuration? or any other solution such as moving free variables data to a custom component? I need these data to be bind to Hosts because of some integration matters with other 3rd party software's.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: nagios XI Memory usage issue

Post by cdienger »

Use https://support.nagios.com/kb/article/n ... e-611.html to increase some of the timeout and memory settings used by XI. I'd start with the values suggested in the KB but you may need to increase them still.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked