NDO3 problems after mariadb upgrade (5.5.68 - 10.6)

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Post Reply
jmsanesteban.sgre
Posts: 51
Joined: Thu Apr 23, 2020 6:46 am

NDO3 problems after mariadb upgrade (5.5.68 - 10.6)

Post by jmsanesteban.sgre »

Good morning all,

I have a weird situation, probably I made some mistakes and I hope the community will help.

We have a problem with the lock tables during the optimize process, so after some research we've decided to upgrade the MySQL to a version compartible with 5.7, because as far as I know there is a way to avoid lock tables in the optimize process.

I'm doing some test in my INT environment:

Nagios Core 4.4.6
NagiosXI 5.8.7
MariaDB 5.5.68
NDO 3.0.7

I've upgraded the MariaDB component to 10.6.10 and now I'm receiving this messages in nagios.log file:

cat /usr/local/nagios/var/nagios.log | grep NDO-3

Code: Select all

[1668164808] NDO-3: Callbacks deregistered
[1668164808] NDO-3: NDO - Shutdown complete
[1668164810] NDO-3: NDO 3.0.7 (c) Copyright 2009-2020 Nagios - Nagios Core Development Team
[1668164810] NDO-3: Unable to connect to mysql. Configuration may be incorrect or database may have temporarily disconnected.
[1668164810] NDO-3: NDO was not able to initialize the database (main context) and will not start.
select * from mysql.global_priv where host = 'localhost' and user = 'ndoutils':

Code: Select all

| localhost | ndoutils | {"access":549755813887,"version_id":100610,"plugin":"mysql_native_password","authentication_string":"*244733929909A95DDF1A7F78DD067589B4092EE7","password_last_changed":1667467358}
I've tried removing the plugin also wit hthe same results...

/usr/local/nagios/etc/ndo.cfg with "default" conf:


Default NDO config for Nagios XI

Code: Select all

db_user=ndoutils
db_pass=*************
db_name=nagios
db_host=localhost
db_port=3306
#db_socket=/var/lib/mysql.sock
db_max_reconnect_attempts=5

acknowledgement_data=1
comment_data=1
contact_status_data=1
downtime_data=1
event_handler_data=1
external_command_data=1
flapping_data=1
host_check_data=1
host_status_data=1
log_data=1
main_config_data=1
notification_data=1
object_config_data=1
process_data=1
program_status_data=1
retention_data=1
service_check_data=1
service_status_data=1
state_change_data=1
system_command_data=1
timed_event_data=1
config_output_options=2
max_object_insert_count=250
mysql_set_charset_name=utf8
log_failed_queries=1
/usr/local/nagios/etc/nagios.cfg

Code: Select all

...
# NDOUtils module
# Commented out by NDO 'make install-broker-line' on Tue Feb  8 12:09:40 CET 2022
#broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg
...
# Added by NDO 'make install-broker-line' on Tue Feb  8 12:09:40 CET 2022
broker_module=/usr/local/nagios/bin/ndo.so /usr/local/nagios/etc/ndo.cfg
I can login into the database using the credentials in the ndo.cfg file:

Code: Select all

mysql -undoutils -D nagios -p
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 827
Server version: 10.6.10-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [nagios]>


Why NDO3 can't login into the database?

Thanks in advance.

BR,
Juanma.
jmsanesteban.sgre
Posts: 51
Joined: Thu Apr 23, 2020 6:46 am

Re: NDO3 problems after mariadb upgrade (5.5.68 - 10.6)

Post by jmsanesteban.sgre »

Hi all,

There is an small update on this. This problem is not related with the DB upgraded, I'm saying that because the problem happend this weekend on the PROD server with the old DB version, so it seems to be a problem with the NDO3 regardless the DB version. NDO logs flooded the system and the server ran out of space.

Please anyone could help to debug this?

We can't offload the DB or downgrade NDO to 2.x due to poor performance.

Thanks in advance.

BR,
Juanma.
jmsanesteban.sgre
Posts: 51
Joined: Thu Apr 23, 2020 6:46 am

Re: NDO3 problems after mariadb upgrade (5.5.68 - 10.6)

Post by jmsanesteban.sgre »

Hi all,

I want to elaborate this a bit more, the problem with our PROD server is not about the DB context, it is about the NDO3 trying to insert data in nagios_servicestatus table:

Code: Select all

[1669199184] NDO-3: The following query failed while MySQL appears to be connected:
[1669199184] NDO-3: INSERT INTO nagios_servicestatus (instance_id, service_object_id, status_update_time, output, long_output, perfdata, current_state, has_been_checked, should_be_scheduled, current_check_attempt, max_check_attempts, last_check, next_check, check_type, check_options, last_state_change, last_hard_state_change, last_hard_state, last_time_ok, last_time_warning, last_time_unknown, last_time_critical, state_type, last_notification, next_notification, no_more_notifications, notifications_enabled, problem_has_been_acknowledged, acknowledgement_type, current_notification_number, passive_checks_enabled, active_checks_enabled, event_handler_enabled, flap_detection_enabled, is_flapping, percent_state_change, latency, execution_time, scheduled_downtime_depth, failure_prediction_enabled, process_performance_data, obsess_over_service, modified_service_attributes, event_handler, check_command, normal_check_interval, retry_check_interval, check_timeperiod_object_id) VALUES (1,24302,FROM_UNIXTIME(1669199184),'CHECK_NRPE: Receive header underflow - only 0 bytes received (4 expected).','','',3,1,1,5,5,FROM_UNIXTIME(1669198922),FROM_UNIXTIME(1669199221),0,0,FROM_UNIXTIME(1663651268),FROM_UNIXTIME(1663651268),3,FROM_UNIXTIME(1661943887),FROM_UNIXTIME(0),FROM_UNIXTIME(1669198922),FROM_UNIXTIME(1662738723),1,FROM_UNIXTIME(0),FROM_UNIXTIME(3600),0,1,0,0,0,1,1,1,1,0,0.000000,0.000000,0.585218,0,0,1,1,0,'','sgre_plt_disk_usage_nrpe!30!75!85!!!!!',5.000000,1.000000,157) ON DUPLICATE KEY UPDATE instance_id = VALUES(instance_id), service_object_id = VALUES(service_object_id), status_update_time = VALUES(status_update_time), output = VALUES(output), long_output = VALUES(long_output), perfdata = VALUES(perfdata), current_state = VALUES(current_state), has_been_checked = VALUES(has_been_checked), should_be_scheduled = VALUES(should_be_scheduled), current_check_attempt = VALUES(current_check_attempt), max_check_attempts = VALUES(max_check_attempts), last_check = VALUES(last_check), next_check = VALUES(next_check), check_type = VALUES(check_type), check_options = VALUES(check_options), last_state_change = VALUES(last_state_change), last_hard_state_change = VALUES(last_hard_state_change), last_hard_state = VALUES(last_hard_state), last_time_ok = VALUES(last_time_ok), last_time_warning = VALUES(last_time_warning), last_time_unknown = VALUES(last_time_unknown), last_time_critical = VALUES(last_time_critical), state_type = VALUES(state_type), last_notification = VALUES(last_notification), next_notification = VALUES(next_notification), no_more_notifications = VALUES(no_more_notifications), notifications_enabled = VALUES(notifications_enabled), problem_has_been_acknowledged = VALUES(problem_has_been_acknowledged), acknowledgement_type = VALUES(acknowledgement_type), current_notification_number = VALUES(current_notification_number), passive_checks_enabled = VALUES(passive_checks_enabled), active_checks_enabled = VALUES(active_checks_enabled), event_handler_enabled = VALUES(event_handler_enabled), flap_detection_enabled = VALUES(flap_detection_enabled), is_flapping = VALUES(is_flapping), percent_state_change = VALUES(percent_state_change), latency = VALUES(latency), execution_time = VALUES(execution_time), scheduled_downtime_depth = VALUES(scheduled_downtime_depth), failure_prediction_enabled = VALUES(failure_prediction_enabled), process_performance_data = VALUES(process_performance_data), obsess_over_service = VALUES(obsess_over_service), modified_service_attributes = VALUES(modified_service_attributes), event_handler = VALUES(event_handler), check_command = VALUES(check_command), normal_check_interval = VALUES(normal_check_interval), retry_check_interval = VALUES(retry_check_interval), check_timeperiod_object_id = VALUES(check_timeperiod_object_id)
So I'm trying to set a non-debug mode for that kind of errors or at least trying to reduce the amount of these entries in log because only today in the log I have 5804826 inserts failed, so the log raised to 20GB. We have been sufering that problem for some days, the biggest log was for about 54GB so we ran out of space and the app collapsed.

The problem is that as far as I know, we don't have access to NDO3 code or debug options so from customer side we can't do anything. Only downgrade to NDO2, and in our case is not possible, because we had to upgrade to NDO3 due to performance problems.

BR,
Juanma.
Post Reply