Nagiosxi HA version upgrade

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
mejokj
Posts: 353
Joined: Mon Jul 22, 2013 10:31 pm

Re: Nagiosxi HA version upgrade

Post by mejokj »

Hello,

we have successfully upgrade the cluster to latest version and after upgrade the livestatus is facing some issues. Telnet shows the below error

Couldn't connect to UNIX-socket at /usr/local/nagios/var/rw/live: Connection refused.
Connection closed by foreign host.

[root@qy-nagios-a ~]# netstat -tulpn | grep 655
tcp6 0 0 :::6557 :::* LISTEN 22572/xinetd



Trying 10.25.32.13...
Connected to 10.25.32.13.
Escape character is '^]'.
Couldn't connect to UNIX-socket at /usr/local/nagios/var/rw/live: Connection refused.
Connection closed by foreign host.
You have new mail in /var/spool/mail/root

Below is the configuration for livestatus.

service livestatus
{
type = UNLISTED
port = 6557
socket_type = stream
protocol = tcp
wait = no
# limit to 100 connections per second. Disable 3 secs if above.
cps = 1000 3
# set the number of maximum allowed parallel instances of unixcat.
# Please make sure that this values is at least as high as
# the number of threads defined with num_client_threads in
# etc/mk-livestatus/nagios.cfg
instances = 5000
# limit the maximum number of simultaneous connections from
# one source IP address
per_source = 2500
# Disable TCP delay, makes connection more responsive
flags = NODELAY
user = nagios
server = /usr/local/nagios/bin/unixcat
server_args = /usr/local/nagios/var/rw/live
# configure the IP address(es) of your Nagios server here:
# only_from = 127.0.0.1 10.0.20.1 10.0.20.2
disable = no
}


++++++++++++++++++++++++++++++++++

[root@qy-nagios-a ~]# systemctl status xinetd
â xinetd.service - Xinetd A Powerful Replacement For Inetd
Loaded: loaded (/usr/lib/systemd/system/xinetd.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2020-08-14 07:19:13 PDT; 29min ago
Process: 16916 ExecReload=/usr/bin/kill -HUP $MAINPID (code=exited, status=0/SUCCESS)
Process: 22571 ExecStart=/usr/sbin/xinetd -stayalive -pidfile /var/run/xinetd.pid $EXTRAOPTIONS (code=exited, status=0/SUCCESS)
Main PID: 22572 (xinetd)
CGroup: /system.slice/xinetd.service
ââ22572 /usr/sbin/xinetd -stayalive -pidfile /var/run/xinetd.pid

Aug 14 07:48:22 qy-nagios-a.i xinetd[22572]: EXIT: nrpe status=0 pid=12121 duration=0(sec)
Aug 14 07:48:22 qy-nagios-a.i xinetd[22572]: START: nrpe pid=12132 from=::ffff:10.34.32.150
Aug 14 07:48:23 qy-nagios-a.isudo[12134]: nagios : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/opt/nagios/libexec/check_init_service crond
Aug 14 07:48:23 qy-nagios-a.i xinetd[22572]: EXIT: nrpe status=0 pid=12132 duration=1(sec)
Aug 14 07:48:33 qy-nagios-a.i xinetd[22572]: START: livestatus pid=12371 from=::ffff:10.65.32.79
Aug 14 07:48:33 qy-nagios-a.i xinetd[22572]: EXIT: livestatus status=4 pid=12371 duration=0(sec)
Aug 14 07:48:34 qy-nagios-a.i xinetd[22572]: START: livestatus pid=12375 from=::ffff:10.65.32.79
Aug 14 07:48:34 qy-nagios-a.i xinetd[22572]: EXIT: livestatus status=4 pid=12375 duration=0(sec)
Aug 14 07:48:46 qy-nagios-a.i xinetd[22572]: START: livestatus pid=12604 from=::ffff:10.65.32.79
Aug 14 07:48:46 qy-nagios-a.i xinetd[22572]: EXIT: livestatus status=4 pid=12604 duration=0(sec)

++++++++++++++++++++++++++++++++++
Last edited by mejokj on Thu Aug 20, 2020 5:14 am, edited 1 time in total.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagiosxi HA version upgrade

Post by benjaminsmith »

Hi @mejokj,
we have successfully upgrade the cluster to latest version and after upgrade the livestatus is facing some issues
Happy to hear you've make good progress upgrading the cluster. Regarding the livestatus issues, this is not our product so this is but out of scope for XI support, but since it's a broker module for Nagios, I would check the ndo.cfg configuration to make sure you are loading the module in the configuration file.

Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
mejokj
Posts: 353
Joined: Mon Jul 22, 2013 10:31 pm

Re: Nagiosxi HA version upgrade

Post by mejokj »

Hello after upgrade the nagios xi , the nagios service is exiting intermediately. so that cluster resource is switching b/w nodes. On checking the nagios.log it shows below.


Nagios service is down and on the logs I could see the below error.
++++++++++++++++++++++++++++++++++++++++++
[1597725675] NDO-3: Unable to prepare statement for query (24): Lost connection to MySQL server during query
[1597725675] Caught SIGSEGV, shutting down...
[1597725675] Caught SIGTERM, shutting down...
++++++++++++++++++++++++++++++++++++++++++


+++++++++++++++++++
[1597661425] NDO-3: Query failed in ndo_empty_queue_notification (handle_notification)
[1597661425] NDO-3: Query failed in ndo_empty_queue_notification (handle_notification)
[1597661425] NDO-3: Query failed in ndo_empty_queue_notification (handle_notification)
[1597661425] NDO-3: Query failed in ndo_empty_queue_notification (handle_notification)
[1597661425] NDO-3: Query failed in ndo_empty_queue_notification (handle_notification)
[1597661425] NDO-3: Query failed in ndo_empty_queue_notification (handle_notification)
[1597661425] NDO-3: Query failed in ndo_empty_queue_notification (handle_notification)
[1597661425] NDO-3: Query failed in ndo_empty_queue_notification (handle_notification)
[1597661425] NDO-3: Query failed in ndo_empty_queue_notification (handle_notification)
[1597661425] NDO-3: NDO startup thread failed at ndo_empty_startup_queues() - some data may be inaccurate.
+++++++++++++++++++

The below are the nagios service status and nagios service conf.
+++++++++++++++
https://ibb.co/bLrNF3Y
https://ibb.co/jwZ3LWK
+++++++++++++++


On checking the mysql log it shows the below. This error is continuously coming.
+++++++++++++++++

200817 11:48:43 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. INSERT... ON DUPLICATE KEY UPDATE on a table with more than one UNIQUE KEY is unsafe Statement: INSERT INTO nagios_servicestatus (instance_id, service_object_id, status_update_time, output, long_output, perfdata, current_state, has_been_checked, should_be_scheduled, current_check_attempt, max_check_attempts, last_check, next_check, check_type, check_options, last_state_change, last_hard_state_change, last_hard_state, last_time_ok, last_time_warning, last_time_unknown, last_time_critical, state_type, last_notification, next_notification, no_more_notifications, notifications_enabled, problem_has_been_acknowledged, acknowledgement_type, current_notification_number, passive_checks_enabled, active_checks_enabled, event_handler_enabled, flap_detection_enabled, is_flapping, percent_state_change, latency, execution_time, scheduled_downtime_depth, failure_prediction_enabled, process_performance_data, obsess_over_servic

+++++++++++++++++


I have increased the mysql max_connections on the mysql.

And Also checks are sopped for all the host intermediately. After that when I restart the nagios service this check will come again.

I have sent you the profile of the server via chat.
Last edited by mejokj on Thu Aug 20, 2020 4:32 am, edited 1 time in total.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagiosxi HA version upgrade

Post by benjaminsmith »

Hi,

Got the profile, thank you. One thing I noticed is there's a number of fatal PHP errors in the log from the history tab component. This component is not compatible with 5.7.2, so I would recommend disable it or remove it.

Code: Select all

[Mon Aug 17 01:37:10.136678 2020] [:error] [pid 39135] [client 10.74.224.131:55379] PHP Fatal error:  Call to undefined function get_backend_xml_data() in /drbd/nagiosxi/html/includes/components/historytab/historytab_do_stuff.php on line 49, referer:
Looking over the following error:
200817 11:48:43 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT
This is a warning related to replication, take a look at the following exchange. You might want to try setting the replication mode to mixed.
https://stackoverflow.com/questions/434 ... og-message
And Also checks are sopped for all the host intermediately. After that when I restart the nagios service this check will come again
After you re-start, how long does this correct the issue or does it return? The lag in check results is a known issue affecting some system in 5.7.2, and we are currently working to resolve this. If it persists, let me know and I can provide instructions on downgrading to the previous version of ndo.

Also, are there any other errors in the db logs, if so please upload the log to the thread. Since the db is offloaded, the profile does not contain the log.

Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
mejokj
Posts: 353
Joined: Mon Jul 22, 2013 10:31 pm

Re: Nagiosxi HA version upgrade

Post by mejokj »

Hello,

We have done the below changes.

1. Removed the history tab.
2. changed the mysql configaration to mixed, now the mentioned error is gone.


After some time nagios log shows the below and nagios check stop, But the mysql service is fine during that time and I can access the mysql from the nagios server and after restart the check is running fine, But this issue is intermediate.
Please provide the ndo downgrade step
++++++++++++++++++++++++++++
[1597820412] NDO-3: ndo_handle_service_status(ndo-handlers.c:953): Could not reconnect to MySQL database
[1597820412] NDO-3: ndo_get_object_id_name1(ndo.c:1132): Could not reconnect to MySQL database
++++++++++++++++++++++++++++

Now the mysql previous error is gone after done the mixed mode and there is no new error are coming on the mysql logs. I have attached the nagios profile with your chat.
Last edited by mejokj on Thu Aug 20, 2020 12:27 pm, edited 1 time in total.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagiosxi HA version upgrade

Post by benjaminsmith »

HI,

Glad to hear that did clear out that error and thanks for the new profile. I noticed the following on the logs:

Code: Select all

 82: Aug 19 05:59:06 qy-nagios-a kernel: type=1105 audit(1597841946.459:6496354): pid=56747 uid=0 auid=1003 ses=440866 msg='op=PAM:session_open grantors=pam_selinux,pam_loginuid,pam_selinux,pam_namespace,pam_keyinit,pam_keyinit,pam_limits,pam_systemd,pam_unix,pam_lastlog acct="nagios" exe="/usr/sbin/sshd" hostname=.....terminal=ssh res=success'
Do you have SELinux enabled on this server. If so, please disable it as it is known to cause issues with Nagios XI. To check, run the following command:

Code: Select all

getenforce
Also, I would recommend turning off Livestatus until these issues are worked out. Not sure if it will be an issue or not, but it's best to rule that out for now. You can comment out the broker module and it will not load when Nagios is restarted.

Lastly, are you running a VM or a physical server?

Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
mejokj
Posts: 353
Joined: Mon Jul 22, 2013 10:31 pm

Re: Nagiosxi HA version upgrade

Post by mejokj »

Today also we have faced the same issues, checked stopped.

The below error is comming on mysql.
++++++++++++++
200820 06:00:11 mysqld_safe mysqld from pid file /var/lib/mysql/mysql.pid ended
200820 06:00:13 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
200820 6:00:13 [Note] /usr/libexec/mysqld (mysqld 5.5.50-MariaDB) starting as process 3853 ...
200820 6:00:13 InnoDB: The InnoDB memory heap is disabled
200820 6:00:13 InnoDB: Mutexes and rw_locks use GCC atomic builtins
200820 6:00:13 InnoDB: Compressed tables use zlib 1.2.7
200820 6:00:13 InnoDB: Using Linux native AIO
200820 6:00:13 InnoDB: Initializing buffer pool, size = 128.0M
200820 6:00:13 InnoDB: Completed initialization of buffer pool
200820 6:00:13 InnoDB: highest supported file format is Barracuda.
200820 6:00:13 InnoDB: Waiting for the background threads to start
200820 6:00:14 Percona XtraDB (http://www.percona.com) 5.5.49-MariaDB-37.9 started; log sequence number 12758082539709
200820 6:00:14 [Note] Plugin 'FEEDBACK' is disabled.
200820 6:00:14 [Note] Server socket created on IP: '0.0.0.0'.
200820 6:00:14 [ERROR] Failed to open the relay log './mysql-relay-bin.000001' (relay_log_pos 4)
200820 6:00:14 [ERROR] Could not find target log during relay log initialization
200820 6:00:14 [ERROR] Failed to initialize the master info structure
200820 6:00:14 [Note] Event Scheduler: Loaded 0 events
200820 6:00:14 [Note] /usr/libexec/mysqld: ready for connections.
Version: '5.5.50-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server

++++++++++++++


The below errors are coming on the nagios log
++++++++++++++
Aug 19 15:25:14 qy-nagios-a. nagios[9792]: job 1538 (pid=24090): read() returned error 11
Aug 19 15:25:15 qy-nagios-a. nagios[9777]: job 1541 (pid=25008): read() returned error 11
Aug 19 15:25:15 qy-nagios-a.nagios[9792]: job 1538 (pid=24090): read() returned error 11
Aug 19 15:25:16 qy-nagios-a. nagios[9777]: job 1541 (pid=25008): read() returned error 11
Aug 19 15:25:16 qy-nagios-a. nagios[9792]: job 1538 (pid=24090): read() returned error 11
Aug 19 15:25:17 qy-nagios-a. nagios[9777]: job 1541 (pid=25008): read() returned error 11
Aug 19 15:25:17 qy-nagios-a. nagios[9792]: job 1538 (pid=24090): read() returned error 11
Aug 19 15:25:17 qy-nagios-a. nagios[9777]: job 1541 (pid=25008): read() returned error 11
Aug 19 15:28:59 qy-nagios-a. nagios[9757]: job 1505 (pid=20328): read() returned error 11



[1597875918.979420] [016.2] [pid=9746] Scheduling new service check event.
[1597875918.979422] [001.0] [pid=9746] add_event()
[1597875918.979424] [064.1] [pid=9746] Making callbacks (type 13)...
[1597890868.934300] [064.1] [pid=9746] Making callbacks (type 2)...
[1597890868.934322] [064.2] [pid=9746] Callback #1 (type 2) return code = 0
[1597890868.934736] [064.1] [pid=9746] Making callbacks (type 2)...
[1597890868.934750] [064.2] [pid=9746] Callback #1 (type 2) return code = 0
[1597890868.934755] [064.2] [pid=9746] Callback #2 (type 2) return code = 0
[1597890868.934872] [064.2] [pid=9746] Callback #2 (type 2) return code = 0



[1597875918.979424] [064.1] [pid=9746] Making callbacks (type 13)...
[1597890868.934300] [064.1] [pid=9746] Making callbacks (type 2)...
[1597890868.934322] [064.2] [pid=9746] Callback #1 (type 2) return code = 0
[1597890868.934736] [064.1] [pid=9746] Making callbacks (type 2)...
[1597890868.934750] [064.2] [pid=9746] Callback #1 (type 2) return code = 0
[1597890868.934755] [064.2] [pid=9746] Callback #2 (type 2) return code = 0
[1597890868.934872] [064.2] [pid=9746] Callback #2 (type 2) return code = 0
++++++++++++++

The selinux is disabled on the server

[root@nagios-a ~]# getenforce
Disabled

- > We are using physical server.

Kindly provide the steps to downgrade the ndo and Also if we downgrade to old version did livestatus and check will work fine?
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagiosxi HA version upgrade

Post by benjaminsmith »

Hi,

Okay. Here are the steps to downgrade ndo3 to the previous version or ndo2db if you have an offloaded database.

Let me know how it's working after the downgrade.

Code: Select all

service nagios stop
cd /tmp
rm -rf /tmp/nagiosxi
wget https://assets.nagios.com/downloads/nagiosxi/5/xi-5.6.14.tar.gz
tar zxf xi-5.6.14.tar.gz
cd /tmp/nagiosxi
If you have an offloaded DB you'll need to update the files with the database credentials.

Code: Select all

Edit /tmp/nagiosxi/xi-sys.cfg and update 'mysqlpass' value.
Edit /tmp/nagiosxi/subcomponents/ndoutils/mods/cfg/ndo2db.cfg and update 'db_host', 'db_user', and 'db_pass' values.
Edit /tmp/nagiosxi/subcomponents/ndoutils/install and /tmp/nagiosxi/subcomponents/ndoutils/post-install to update all calls to mysql to include -h <db_ip>
Then compile and enable ndo2db.

Code: Select all

cd /tmp/nagiosxi/subcomponents/ndoutils
./install
chkconfig ndo2db on
service ndo2db start
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
mejokj
Posts: 353
Joined: Mon Jul 22, 2013 10:31 pm

Re: Nagiosxi HA version upgrade

Post by mejokj »

Hello,

We didn't try this, because of the lack of downtime. We restore the VM snapshot.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Nagiosxi HA version upgrade

Post by benjaminsmith »

Hi,
We didn't try this, because of the lack of downtime. We restore the VM snapshot.
Are you back running the previous version again? You can always deploy a test server and upgrade the production instance once you're ready.

Let me know who you plan to proceed.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked