Nagios Support Forum

Posted: **Mon Jul 02, 2018 7:48 am**

Capture1.PNG

Hi to all,

After upgrade Nagios XI to 5.5 version, i have some strange problems.

You can see from attachment, that when i checked quick config all values for ip address, hostgroup, contactgroup and others are unckecked,, while in CCM everything is ok.

Main problem is when i apply configuration, monitoring engine is going down for about 15-20 minutes.

In that 15-20 minutes there is a lot of this messages

Code: Select all

Jul  2 12:06:47 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:48 nagiosxi3 ndo2db: Message sent to queue.
Jul  2 12:06:48 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:49 nagiosxi3 ndo2db: Message sent to queue.
Jul  2 12:06:49 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:50 nagiosxi3 ndo2db: Message sent to queue.
Jul  2 12:06:50 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:51 nagiosxi3 ndo2db: Message sent to queue.
Jul  2 12:06:51 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:52 nagiosxi3 ndo2db: Message sent to queue.
Jul  2 12:06:52 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:53 nagiosxi3 ndo2db: Message sent to queue.
Jul  2 12:06:54 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:55 nagiosxi3 ndo2db: Message sent to queue.
Jul  2 12:06:55 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:56 nagiosxi3 ndo2db: Message sent to queue.
Jul  2 12:06:56 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:57 nagiosxi3 ndo2db: Message sent to queue.

Mysql log is ok

Code: Select all

[root@nagiosxi3 ~]# tail -f /var/log/mysqld.log
180701 15:01:50 [Note] /usr/libexec/mysqld: Shutdown complete

180701 15:01:50 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
180701 15:01:50 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
180701 15:01:51  InnoDB: Initializing buffer pool, size = 8.0M
180701 15:01:51  InnoDB: Completed initialization of buffer pool
180701 15:01:51  InnoDB: Started; log sequence number 0 18581314
180701 15:01:51 [Note] Event Scheduler: Loaded 0 events
180701 15:01:51 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distributio

Code: Select all

Repair Complete: Removing Lock File
CREATING: /usr/local/nagiosxi/var/dbmaint.lock
CLEANING ndoutils TABLE 'commenthistory'...
SQL: DELETE FROM nagios_commenthistory WHERE entry_time < FROM_UNIXTIME(1498998902)
CLEANING ndoutils TABLE 'processevents'...
SQL: DELETE FROM nagios_processevents WHERE event_time < FROM_UNIXTIME(1498998902)
CLEANING ndoutils TABLE 'externalcommands'...
SQL: DELETE FROM nagios_externalcommands WHERE entry_time < FROM_UNIXTIME(1529930102)
CLEANING ndoutils TABLE 'logentries'...
SQL: DELETE FROM nagios_logentries WHERE logentry_time < FROM_UNIXTIME(1522758902)
CLEANING ndoutils TABLE 'notifications'...
SQL: DELETE FROM nagios_notifications WHERE start_time < FROM_UNIXTIME(1522758902)
CLEANING ndoutils TABLE 'contactnotifications'...
SQL: DELETE FROM nagios_contactnotifications WHERE start_time < FROM_UNIXTIME(1522758902)
CLEANING ndoutils TABLE 'contactnotificationmethods'...
SQL: DELETE FROM nagios_contactnotificationmethods WHERE start_time < FROM_UNIXTIME(1522758902)
CLEANING ndoutils TABLE 'statehistory'...
SQL: DELETE FROM nagios_statehistory WHERE state_time < FROM_UNIXTIME(1467462902)
CLEANING ndoutils TABLE 'timedevents'...
SQL: DELETE FROM nagios_timedevents WHERE event_time < FROM_UNIXTIME(1530534602)
CLEANING ndoutils TABLE 'systemcommands'...
SQL: DELETE FROM nagios_systemcommands WHERE start_time < FROM_UNIXTIME(1530534602)
CLEANING ndoutils TABLE 'servicechecks'...
SQL: DELETE FROM nagios_servicechecks WHERE start_time < FROM_UNIXTIME(1530534602)
CLEANING ndoutils TABLE 'hostchecks'...
SQL: DELETE FROM nagios_hostchecks WHERE start_time < FROM_UNIXTIME(1530534602)
CLEANING ndoutils TABLE 'eventhandlers'...
SQL: DELETE FROM nagios_eventhandlers WHERE start_time < FROM_UNIXTIME(1530534602)
LASTOPT:  1530533402
INTERVAL: 60
NOW:      1530534902
OPTTIME:  1530537002
CLEANING nagiosxi TABLE 'commands'...
SQL: DELETE FROM xi_commands WHERE processing_time < FROM_UNIXTIME(1530506102) AND status_code = 2
CLEANING nagiosxi TABLE 'events'...
SQL: DELETE FROM xi_events WHERE processing_time < FROM_UNIXTIME(1530506102) AND status_code = 2
CLEANING nagiosxi TABLE 'sessions'...
SQL: DELETE FROM xi_sessions WHERE session_last_active < FROM_UNIXTIME(1530448502)
CLEANING nagiosxi TABLE 'auth_tokens'...
SQL: DELETE FROM xi_auth_tokens WHERE auth_valid_until < FROM_UNIXTIME(1530448502)
CLEANING nagiosxi TABLE 'cmp_trapdata_log'...
SQL: DELETE FROM xi_cmp_trapdata_log WHERE trapdata_log_datetime < FROM_UNIXTIME(1522758902)
SQL1: SELECT xi_meta.meta_id FROM xi_meta LEFT JOIN xi_events ON xi_meta.metaobj_id=xi_events.event_id WHERE metatype_id='1' AND event_id IS NULL
SQL2: Deleted 42 (DELETE FROM xi_meta WHERE meta_id IN (SELECT xi_meta.meta_id FROM xi_meta LEFT JOIN xi_events ON xi_meta.metaobj_id=xi_events.event_id WHERE metatype_id='1' AND event_id IS NULL))
CLEANING nagiosxi TABLE 'auditlog'...
SQL: DELETE FROM xi_auditlog WHERE log_time < FROM_UNIXTIME(1527942902)
CLEANING nagiosql TABLE 'logbook'...
SQL: DELETE FROM tbl_logbook WHERE time < FROM_UNIXTIME(1530506102)

Posted: **Mon Jul 02, 2018 1:31 pm**

This looks like your database maint script may have been in the middle of performing maintenance which can lock some tables.

Has this completed now?

Posted: **Wed Jul 04, 2018 4:51 am**

No, still is the same situation. after i add host through Network wizard, XI is unusable for 30 minutes. Monitoring event queue is empty, and Monitoring Engine Process is in red state, but actually nagios process is ok, and when access to x.x.x.x/nagios checks are ok, but in XI doesn't seems ok.

Edit. Also one strange thing, when i apply config and configuration finished, in Services tab(Monitoring>Services in CCM) there is a message Changes detected! Apply Configuration for new changes to take effect.

Edit 2. Also i get this message in /var/log/messages

Code: Select all

Jul  4 12:44:04 nagiosxi3 ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 128000 of 23795 messages and 131072000 of 131072000 bytes in the queue. See README for kernel tuning options.

Edit.3

I increased msgbytes, but it's still the same situation.

Code: Select all

[root@nagiosxi3 ~]# sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 524288000
kernel.msgmax = 524288000
kernel.shmmax = 4294967295
kernel.shmall = 268435456
kernel.msgmni = 512000

Posted: **Thu Jul 05, 2018 9:11 am**

Please post the output of the following

Code: Select all

ps -ef|grep nagios.cfg

Posted: **Thu Jul 05, 2018 9:15 am**

Code: Select all

[root@nagiosxi3 ~]# ps -ef|grep nagios.cfg
nagios    2575     1  4 15:35 ?        00:01:32 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    2688  2575  0 15:35 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root     18814 27326  0 16:13 pts/0    00:00:00 grep nagios.cfg
[root@nagiosxi3 ~]#

Posted: **Thu Jul 05, 2018 1:28 pm**

Please send profile.zip from Admin -> System Profile

Thanks

Posted: **Thu Jul 05, 2018 1:40 pm**

I send you profile in PM.

Posted: **Thu Jul 05, 2018 2:12 pm**

Queue is currently empty so Nagios must have caught up.

Would it be possible to generate the profile when you are waiting for the monitoring process to finish updating the database.

Also the delay is likely because of so much activity happening at the same time on the Nagios server, and one of the first recommended performance enhancements to systems as large as yours is to offload the database

https://assets.nagios.com/downloads/nag ... Server.pdf

Posted: **Fri Jul 06, 2018 6:10 am**

This problems only happens on 5.5, on previous version(s) everything worked fine with more than 5000 services. Also like i mention every time when i apply configuration still shows in services(Monitoring>Services) that config is not applied.

I send you new profile when actually problem begins.

Posted: **Fri Jul 06, 2018 9:27 am**

Looking at the new profile the load on the server is very high much of which is coming from mysql, but also you have a ton of MRTG processes

Code: Select all

top - 12:41:34 up 4 days, 22:15,  4 users,  load average: 31.70, 35.01, 39.28
Tasks: 337 total,  14 running, 323 sleeping,   0 stopped,   0 zombie
Cpu(s): 49.4%us, 16.2%sy,  0.0%ni, 33.4%id,  0.6%wa,  0.0%hi,  0.4%si,  0.0%st
Mem:  12318660k total, 11558864k used,   759796k free,    82708k buffers
Swap:  8191996k total,   124256k used,  8067740k free,  5136152k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
25491 mysql     20   0 4332m 397m 4800 S 158.3  3.3 765:18.88 mysqld

Much of the load is from MRTG, you have 2193 mrtg configs in /etc/mrtg/conf.d
If you are not monitoring these I strongly suggest removing what you are not using as they are slowing down your server

Nagios Support Forum

Problems after upgradee to 5.5

Problems after upgradee to 5.5

Re: Problems after upgradee to 5.5

Re: Problems after upgradee to 5.5

Re: Problems after upgradee to 5.5

Re: Problems after upgradee to 5.5

Re: Problems after upgradee to 5.5

Re: Problems after upgradee to 5.5

Re: Problems after upgradee to 5.5

Re: Problems after upgradee to 5.5

Re: Problems after upgradee to 5.5