Problems after upgradee to 5.5

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
nik.vu
Posts: 34
Joined: Tue Feb 07, 2017 4:28 pm

Problems after upgradee to 5.5

Post by nik.vu »

Capture1.PNG
Capture1.PNG
Hi to all,

After upgrade Nagios XI to 5.5 version, i have some strange problems.



You can see from attachment, that when i checked quick config all values for ip address, hostgroup, contactgroup and others are unckecked,, while in CCM everything is ok.

Main problem is when i apply configuration, monitoring engine is going down for about 15-20 minutes.

In that 15-20 minutes there is a lot of this messages

Code: Select all

Jul  2 12:06:47 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:48 nagiosxi3 ndo2db: Message sent to queue.
Jul  2 12:06:48 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:49 nagiosxi3 ndo2db: Message sent to queue.
Jul  2 12:06:49 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:50 nagiosxi3 ndo2db: Message sent to queue.
Jul  2 12:06:50 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:51 nagiosxi3 ndo2db: Message sent to queue.
Jul  2 12:06:51 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:52 nagiosxi3 ndo2db: Message sent to queue.
Jul  2 12:06:52 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:53 nagiosxi3 ndo2db: Message sent to queue.
Jul  2 12:06:54 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:55 nagiosxi3 ndo2db: Message sent to queue.
Jul  2 12:06:55 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:56 nagiosxi3 ndo2db: Message sent to queue.
Jul  2 12:06:56 nagiosxi3 ndo2db: Warning: queue send error, retrying...
Jul  2 12:06:57 nagiosxi3 ndo2db: Message sent to queue.
Mysql log is ok

Code: Select all

[root@nagiosxi3 ~]# tail -f /var/log/mysqld.log
180701 15:01:50 [Note] /usr/libexec/mysqld: Shutdown complete

180701 15:01:50 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended
180701 15:01:50 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
180701 15:01:51  InnoDB: Initializing buffer pool, size = 8.0M
180701 15:01:51  InnoDB: Completed initialization of buffer pool
180701 15:01:51  InnoDB: Started; log sequence number 0 18581314
180701 15:01:51 [Note] Event Scheduler: Loaded 0 events
180701 15:01:51 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.1.73'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  Source distributio

Code: Select all

Repair Complete: Removing Lock File
CREATING: /usr/local/nagiosxi/var/dbmaint.lock
CLEANING ndoutils TABLE 'commenthistory'...
SQL: DELETE FROM nagios_commenthistory WHERE entry_time < FROM_UNIXTIME(1498998902)
CLEANING ndoutils TABLE 'processevents'...
SQL: DELETE FROM nagios_processevents WHERE event_time < FROM_UNIXTIME(1498998902)
CLEANING ndoutils TABLE 'externalcommands'...
SQL: DELETE FROM nagios_externalcommands WHERE entry_time < FROM_UNIXTIME(1529930102)
CLEANING ndoutils TABLE 'logentries'...
SQL: DELETE FROM nagios_logentries WHERE logentry_time < FROM_UNIXTIME(1522758902)
CLEANING ndoutils TABLE 'notifications'...
SQL: DELETE FROM nagios_notifications WHERE start_time < FROM_UNIXTIME(1522758902)
CLEANING ndoutils TABLE 'contactnotifications'...
SQL: DELETE FROM nagios_contactnotifications WHERE start_time < FROM_UNIXTIME(1522758902)
CLEANING ndoutils TABLE 'contactnotificationmethods'...
SQL: DELETE FROM nagios_contactnotificationmethods WHERE start_time < FROM_UNIXTIME(1522758902)
CLEANING ndoutils TABLE 'statehistory'...
SQL: DELETE FROM nagios_statehistory WHERE state_time < FROM_UNIXTIME(1467462902)
CLEANING ndoutils TABLE 'timedevents'...
SQL: DELETE FROM nagios_timedevents WHERE event_time < FROM_UNIXTIME(1530534602)
CLEANING ndoutils TABLE 'systemcommands'...
SQL: DELETE FROM nagios_systemcommands WHERE start_time < FROM_UNIXTIME(1530534602)
CLEANING ndoutils TABLE 'servicechecks'...
SQL: DELETE FROM nagios_servicechecks WHERE start_time < FROM_UNIXTIME(1530534602)
CLEANING ndoutils TABLE 'hostchecks'...
SQL: DELETE FROM nagios_hostchecks WHERE start_time < FROM_UNIXTIME(1530534602)
CLEANING ndoutils TABLE 'eventhandlers'...
SQL: DELETE FROM nagios_eventhandlers WHERE start_time < FROM_UNIXTIME(1530534602)
LASTOPT:  1530533402
INTERVAL: 60
NOW:      1530534902
OPTTIME:  1530537002
CLEANING nagiosxi TABLE 'commands'...
SQL: DELETE FROM xi_commands WHERE processing_time < FROM_UNIXTIME(1530506102) AND status_code = 2
CLEANING nagiosxi TABLE 'events'...
SQL: DELETE FROM xi_events WHERE processing_time < FROM_UNIXTIME(1530506102) AND status_code = 2
CLEANING nagiosxi TABLE 'sessions'...
SQL: DELETE FROM xi_sessions WHERE session_last_active < FROM_UNIXTIME(1530448502)
CLEANING nagiosxi TABLE 'auth_tokens'...
SQL: DELETE FROM xi_auth_tokens WHERE auth_valid_until < FROM_UNIXTIME(1530448502)
CLEANING nagiosxi TABLE 'cmp_trapdata_log'...
SQL: DELETE FROM xi_cmp_trapdata_log WHERE trapdata_log_datetime < FROM_UNIXTIME(1522758902)
SQL1: SELECT xi_meta.meta_id FROM xi_meta LEFT JOIN xi_events ON xi_meta.metaobj_id=xi_events.event_id WHERE metatype_id='1' AND event_id IS NULL
SQL2: Deleted 42 (DELETE FROM xi_meta WHERE meta_id IN (SELECT xi_meta.meta_id FROM xi_meta LEFT JOIN xi_events ON xi_meta.metaobj_id=xi_events.event_id WHERE metatype_id='1' AND event_id IS NULL))
CLEANING nagiosxi TABLE 'auditlog'...
SQL: DELETE FROM xi_auditlog WHERE log_time < FROM_UNIXTIME(1527942902)
CLEANING nagiosql TABLE 'logbook'...
SQL: DELETE FROM tbl_logbook WHERE time < FROM_UNIXTIME(1530506102)
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Problems after upgradee to 5.5

Post by scottwilkerson »

This looks like your database maint script may have been in the middle of performing maintenance which can lock some tables.

Has this completed now?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
nik.vu
Posts: 34
Joined: Tue Feb 07, 2017 4:28 pm

Re: Problems after upgradee to 5.5

Post by nik.vu »

No, still is the same situation. after i add host through Network wizard, XI is unusable for 30 minutes. Monitoring event queue is empty, and Monitoring Engine Process is in red state, but actually nagios process is ok, and when access to x.x.x.x/nagios checks are ok, but in XI doesn't seems ok.

Edit. Also one strange thing, when i apply config and configuration finished, in Services tab(Monitoring>Services in CCM) there is a message Changes detected! Apply Configuration for new changes to take effect.

Edit 2. Also i get this message in /var/log/messages

Code: Select all

Jul  4 12:44:04 nagiosxi3 ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 128000 of 23795 messages and 131072000 of 131072000 bytes in the queue. See README for kernel tuning options.
Edit.3

I increased msgbytes, but it's still the same situation.

Code: Select all

[root@nagiosxi3 ~]# sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 524288000
kernel.msgmax = 524288000
kernel.shmmax = 4294967295
kernel.shmall = 268435456
kernel.msgmni = 512000
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Problems after upgradee to 5.5

Post by scottwilkerson »

Please post the output of the following

Code: Select all

ps -ef|grep nagios.cfg
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
nik.vu
Posts: 34
Joined: Tue Feb 07, 2017 4:28 pm

Re: Problems after upgradee to 5.5

Post by nik.vu »

Code: Select all

[root@nagiosxi3 ~]# ps -ef|grep nagios.cfg
nagios    2575     1  4 15:35 ?        00:01:32 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    2688  2575  0 15:35 ?        00:00:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
root     18814 27326  0 16:13 pts/0    00:00:00 grep nagios.cfg
[root@nagiosxi3 ~]#
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Problems after upgradee to 5.5

Post by scottwilkerson »

Please send profile.zip from Admin -> System Profile

Thanks
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
nik.vu
Posts: 34
Joined: Tue Feb 07, 2017 4:28 pm

Re: Problems after upgradee to 5.5

Post by nik.vu »

I send you profile in PM.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Problems after upgradee to 5.5

Post by scottwilkerson »

Queue is currently empty so Nagios must have caught up.

Would it be possible to generate the profile when you are waiting for the monitoring process to finish updating the database.

Also the delay is likely because of so much activity happening at the same time on the Nagios server, and one of the first recommended performance enhancements to systems as large as yours is to offload the database

https://assets.nagios.com/downloads/nag ... Server.pdf
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
nik.vu
Posts: 34
Joined: Tue Feb 07, 2017 4:28 pm

Re: Problems after upgradee to 5.5

Post by nik.vu »

This problems only happens on 5.5, on previous version(s) everything worked fine with more than 5000 services. Also like i mention every time when i apply configuration still shows in services(Monitoring>Services) that config is not applied.

I send you new profile when actually problem begins.
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Problems after upgradee to 5.5

Post by scottwilkerson »

Looking at the new profile the load on the server is very high much of which is coming from mysql, but also you have a ton of MRTG processes

Code: Select all

top - 12:41:34 up 4 days, 22:15,  4 users,  load average: 31.70, 35.01, 39.28
Tasks: 337 total,  14 running, 323 sleeping,   0 stopped,   0 zombie
Cpu(s): 49.4%us, 16.2%sy,  0.0%ni, 33.4%id,  0.6%wa,  0.0%hi,  0.4%si,  0.0%st
Mem:  12318660k total, 11558864k used,   759796k free,    82708k buffers
Swap:  8191996k total,   124256k used,  8067740k free,  5136152k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
25491 mysql     20   0 4332m 397m 4800 S 158.3  3.3 765:18.88 mysqld  
Much of the load is from MRTG, you have 2193 mrtg configs in /etc/mrtg/conf.d
If you are not monitoring these I strongly suggest removing what you are not using as they are slowing down your server
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked