Page 1 of 2

NDO-3 problem

Posted: Mon Oct 04, 2021 3:08 pm
by cbeattie-unitrends
Hello,

Recently I noticed one of my Nagios servers with a lot of stale passive check results. Everything had been running well as far as I know. I have offloaded the databases onto an external server, so the CPU load does not seem too high on either Nagios or the database server. Looking in /usr/local/nagios/var/nagios.log I see lines like this, sometimes interspersed with the normal log entries and other times whole blocks of them:

Code: Select all

[1633357264] NDO-3: ndo_return = 1 (Statement not prepared)
[1633357264] NDO-3: ndo_handle_notification(ndo-handlers.c:1264): Unable to bind parameters
Nagios is up to date and the database tables are all InnoDB.

Thanks.

Re: NDO-3 problem

Posted: Tue Oct 05, 2021 4:16 pm
by benjaminsmith
Hi,

We believe one of the tables did not get properly updated during the upgrade. Please run the following command to dump the database data, and we'll take a closer look.

Code: Select all

mysqldump --no-data --database nagios -u username -phassword -h xxx.xxx.xxx.xxx >nagios.sql
Adjust the username, password, xxx.xxx.xxx.xxx

Thanks,
Benjamin

Re: NDO-3 problem

Posted: Wed Oct 06, 2021 4:37 pm
by cbeattie-unitrends
Hello,

I wasn't sure if that file contains anything secret, so I sent it to you in a PM.

Thank you.

Re: NDO-3 problem

Posted: Thu Oct 07, 2021 3:45 pm
by ssax
It would contain sensitive info.

Please PM me a copy of your profile.zip so I can review your logs/settings, you can download it from Admin > System Profile by clicking the Download Profile button.

What is the output of this command?

Code: Select all

strings /usr/local/nagios/bin/ndo.so | grep Copyright

Re: NDO-3 problem

Posted: Fri Oct 08, 2021 11:39 am
by cbeattie-unitrends
Hello,
I sent the profile via PM. Here's the output of the command:

Code: Select all

[root@den-nagios certs]# strings /usr/local/nagios/bin/ndo.so | grep Copyright
NDO 3.0.7 (c) Copyright 2009-2020 Nagios - Nagios Core Development Team
Thanks.

Re: NDO-3 problem

Posted: Mon Oct 11, 2021 5:39 pm
by ssax
I do not see a nagios_notifications table in your nagios dump file. Do you have crashed tables?

Please send the output of this command:
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
Is this the proper amount of hosts/services?

Code: Select all

Total Hosts: 2379
Total Services: 72696
This is almost full as well:

Code: Select all

tmpfs                1.0G  989M   36M  97% /var/nagiosramdisk
What is the output of these commands:

Code: Select all

du -sh /var/nagiosramdisk/*

Re: NDO-3 problem

Posted: Tue Oct 12, 2021 9:20 am
by cbeattie-unitrends
I don't think I have any crashed tables. I looked at /var/log/mariadb/mariadb.log and although grepping for crash didn't return anything, the other contents weren't what I expected. The pattern seems to be that the warning gets repeated a bunch of times, then the monitor output.

Code: Select all

2021-10-12 13:47:54 7365508 [Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. innodb_buffer_pool_size=128M. Starting the InnoDB Monitor to print diagnostics.

=====================================
2021-10-12 13:48:04 0x7fe2f6546700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 47 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 936964 srv_active, 0 srv_shutdown, 179 srv_idle
srv_master_thread log flush and writes: 937141
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 2011986257
OS WAIT ARRAY INFO: signal count 3328546024
RW-shared spins 24329468616, rounds 167038144395, OS waits 655528858
RW-excl spins 917673661, rounds 9405051947, OS waits 105404570
RW-sx spins 29169304, rounds 513076929, OS waits 8575783
Spin rounds per wait: 6.87 RW-shared, 10.25 RW-excl, 17.59 RW-sx
FAIL TO OBTAIN LOCK MUTEX, SKIP LOCK INFO PRINTING
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
55271980514 OS file reads, 1088887784 OS file writes, 341751569 OS fsyncs
1 pending reads, 0 pending writes
58595.94 reads/s, 16383 avg bytes/read, 648.88 writes/s, 345.04 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 8, free list len 2426, seg size 2435, 19612541 merges
merged operations:
 insert 83963947, delete mark 299194323, delete 7328246
discarded operations:
 insert 51392, delete mark 1879, delete 76
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 16 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 2 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 8 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 3 buffer(s)
322536.58 hash searches/s, 76575.65 non-hash searches/s
---
LOG
---
Log sequence number 23336846891287
Log flushed up to   23336846891287
Pages flushed up to 23336846546811
Last checkpoint at  23336844674792
0 pending log flushes, 0 pending chkp writes
230604756 log i/o's done, 288.43 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 170590208
Dictionary memory allocated 457776
Buffer pool size   8192
Free buffers       0
Database pages     2239
Old database pages 835
Modified db pages  134
Percent of dirty pages(LRU & free pages): 5.982
Max dirty pages percent: 75.000
Pending reads 2
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 9451354, not young 1264073857478
11.43 youngs/s, 2066569.39 non-youngs/s
Pages read 55271620385, created 468481754, written 806957859
58596.52 reads/s, 9.70 creates/s, 336.16 writes/s
Buffer pool hit rate 976 / 1000, young-making rate 0 / 1000 not 843 / 1000
Pages read ahead 920.92/s, evicted without access 857.71/s, Random read ahead 0.00/s
LRU len: 2239, unzip_LRU len: 0
I/O sum[4582394]:cur[7981], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
14 read views open inside InnoDB
Process ID=1434, Main thread ID=140612746884864, state: sleeping
Number of rows inserted 93580553, updated 99124466, deleted 78924030, read 1287409090672
132.12 inserts/s, 117.34 updates/s, 10.36 deletes/s, 2028622.73 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
Here's the output from the first command:

Code: Select all

+--------------------------------------------+------------+
| Table                                      | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements                    |       0.44 |
| nagios_commands                            |       0.06 |
| nagios_commenthistory                      |    1091.58 |
| nagios_comments                            |       3.67 |
| nagios_configfiles                         |       0.03 |
| nagios_configfilevariables                 |       0.02 |
| nagios_conninfo                            |       0.02 |
| nagios_contact_addresses                   |       0.03 |
| nagios_contact_notificationcommands        |       0.03 |
| nagios_contactgroup_members                |       0.03 |
| nagios_contactgroups                       |       0.03 |
| nagios_contactnotificationmethods          |       1.91 |
| nagios_contactnotifications                |       1.05 |
| nagios_contacts                            |       0.03 |
| nagios_contactstatus                       |       0.03 |
| nagios_customvariables                     |       0.63 |
| nagios_customvariablestatus                |       0.63 |
| nagios_dbversion                           |       0.02 |
| nagios_downtimehistory                     |      20.03 |
| nagios_eventhandlers                       |       4.30 |
| nagios_externalcommands                    |       0.05 |
| nagios_flappinghistory                     |     450.88 |
| nagios_host_contactgroups                  |       0.17 |
| nagios_host_contacts                       |       0.03 |
| nagios_host_parenthosts                    |       0.03 |
| nagios_hostchecks                          |       1.67 |
| nagios_hostdependencies                    |       0.03 |
| nagios_hostescalation_contactgroups        |       0.30 |
| nagios_hostescalation_contacts             |       0.16 |
| nagios_hostescalations                     |       0.42 |
| nagios_hostgroup_members                   |       0.17 |
| nagios_hostgroups                          |       0.03 |
| nagios_hosts                               |       1.63 |
| nagios_hoststatus                          |       2.48 |
| nagios_instances                           |       0.02 |
| nagios_objects                             |      30.61 |
| nagios_processevents                       |       0.38 |
| nagios_programstatus                       |       0.03 |
| nagios_runtimevariables                    |       0.03 |
| nagios_scheduleddowntime                   |       0.03 |
| nagios_service_contactgroups               |       3.03 |
| nagios_service_contacts                    |       4.03 |
| nagios_service_parentservices              |       0.03 |
| nagios_servicechecks                       |      13.06 |
| nagios_servicedependencies                 |       0.03 |
| nagios_serviceescalation_contactgroups     |       3.03 |
| nagios_serviceescalation_contacts          |       0.03 |
| nagios_serviceescalations                  |       4.03 |
| nagios_servicegroup_members                |       0.27 |
| nagios_servicegroups                       |       0.03 |
| nagios_services                            |      24.58 |
| nagios_servicestatus                       |      66.59 |
| nagios_statehistory                        |     395.83 |
| nagios_systemcommands                      |       0.05 |
| nagios_timedeventqueue                     |       0.09 |
| nagios_timedevents                         |       0.09 |
| nagios_timeperiod_timeranges               |       0.03 |
| nagios_timeperiods                         |       0.03 |
| tbl_command                                |       0.06 |
| tbl_contact                                |       0.03 |
| tbl_contactgroup                           |       0.03 |
| tbl_contacttemplate                        |       0.03 |
| tbl_domain                                 |       0.03 |
| tbl_host                                   |       0.50 |
| tbl_hostdependency                         |       0.03 |
| tbl_hostescalation                         |       0.03 |
| tbl_hostextinfo                            |       0.03 |
| tbl_hostgroup                              |       0.03 |
| tbl_hosttemplate                           |       0.03 |
| tbl_info                                   |       0.17 |
| tbl_lnkContactToCommandHost                |       0.02 |
| tbl_lnkContactToCommandService             |       0.02 |
| tbl_lnkContactToContactgroup               |       0.02 |
| tbl_lnkContactToContacttemplate            |       0.02 |
| tbl_lnkContactToVariabledefinition         |       0.02 |
| tbl_lnkContactgroupToContact               |       0.02 |
| tbl_lnkContactgroupToContactgroup          |       0.02 |
| tbl_lnkContacttemplateToCommandHost        |       0.02 |
| tbl_lnkContacttemplateToCommandService     |       0.02 |
| tbl_lnkContacttemplateToContactgroup       |       0.02 |
| tbl_lnkContacttemplateToContacttemplate    |       0.02 |
| tbl_lnkContacttemplateToVariabledefinition |       0.02 |
| tbl_lnkHostToContact                       |       0.02 |
| tbl_lnkHostToContactgroup                  |       0.02 |
| tbl_lnkHostToHost                          |       0.02 |
| tbl_lnkHostToHostgroup                     |       0.09 |
| tbl_lnkHostToHosttemplate                  |       0.11 |
| tbl_lnkHostToVariabledefinition            |       0.02 |
| tbl_lnkHostdependencyToHost_DH             |       0.02 |
| tbl_lnkHostdependencyToHost_H              |       0.02 |
| tbl_lnkHostdependencyToHostgroup_DH        |       0.02 |
| tbl_lnkHostdependencyToHostgroup_H         |       0.02 |
| tbl_lnkHostescalationToContact             |       0.02 |
| tbl_lnkHostescalationToContactgroup        |       0.02 |
| tbl_lnkHostescalationToHost                |       0.02 |
| tbl_lnkHostescalationToHostgroup           |       0.02 |
| tbl_lnkHostgroupToHost                     |       0.02 |
| tbl_lnkHostgroupToHostgroup                |       0.02 |
| tbl_lnkHosttemplateToContact               |       0.02 |
| tbl_lnkHosttemplateToContactgroup          |       0.02 |
| tbl_lnkHosttemplateToHost                  |       0.02 |
| tbl_lnkHosttemplateToHostgroup             |       0.02 |
| tbl_lnkHosttemplateToHosttemplate          |       0.02 |
| tbl_lnkHosttemplateToVariabledefinition    |       0.02 |
| tbl_lnkServiceToContact                    |       0.02 |
| tbl_lnkServiceToContactgroup               |       0.02 |
| tbl_lnkServiceToHost                       |       0.05 |
| tbl_lnkServiceToHostgroup                  |       0.02 |
| tbl_lnkServiceToServicegroup               |       0.02 |
| tbl_lnkServiceToServicetemplate            |       0.06 |
| tbl_lnkServiceToVariabledefinition         |       0.02 |
| tbl_lnkServicedependencyToHost_DH          |       0.02 |
| tbl_lnkServicedependencyToHost_H           |       0.02 |
| tbl_lnkServicedependencyToHostgroup_DH     |       0.02 |
| tbl_lnkServicedependencyToHostgroup_H      |       0.02 |
| tbl_lnkServicedependencyToService_DS       |       0.02 |
| tbl_lnkServicedependencyToService_S        |       0.02 |
| tbl_lnkServicedependencyToServicegroup_DS  |       0.02 |
| tbl_lnkServicedependencyToServicegroup_S   |       0.02 |
| tbl_lnkServiceescalationToContact          |       0.02 |
| tbl_lnkServiceescalationToContactgroup     |       0.02 |
| tbl_lnkServiceescalationToHost             |       0.02 |
| tbl_lnkServiceescalationToHostgroup        |       0.02 |
| tbl_lnkServiceescalationToService          |       0.02 |
| tbl_lnkServiceescalationToServicegroup     |       0.02 |
| tbl_lnkServicegroupToService               |       0.02 |
| tbl_lnkServicegroupToServicegroup          |       0.02 |
| tbl_lnkServicetemplateToContact            |       0.02 |
| tbl_lnkServicetemplateToContactgroup       |       0.02 |
| tbl_lnkServicetemplateToHost               |       0.02 |
| tbl_lnkServicetemplateToHostgroup          |       0.02 |
| tbl_lnkServicetemplateToServicegroup       |       0.02 |
| tbl_lnkServicetemplateToServicetemplate    |       0.02 |
| tbl_lnkServicetemplateToVariabledefinition |       0.02 |
| tbl_lnkTimeperiodToTimeperiod              |       0.02 |
| tbl_logbook                                |       0.02 |
| tbl_mainmenu                               |       0.02 |
| tbl_permission                             |       0.02 |
| tbl_permission_inactive                    |       0.02 |
| tbl_service                                |       0.13 |
| tbl_servicedependency                      |       0.03 |
| tbl_serviceescalation                      |       0.03 |
| tbl_serviceextinfo                         |       0.03 |
| tbl_servicegroup                           |       0.03 |
| tbl_servicetemplate                        |       0.03 |
| tbl_session                                |       0.02 |
| tbl_session_locks                          |       0.02 |
| tbl_settings                               |       0.03 |
| tbl_submenu                                |       0.02 |
| tbl_timedefinition                         |       0.02 |
| tbl_timeperiod                             |       0.03 |
| tbl_user                                   |       0.03 |
| tbl_variabledefinition                     |       0.02 |
| xi_auditlog                                |     162.72 |
| xi_auth_tokens                             |       0.03 |
| xi_cmp_ccm_backups                         |       0.02 |
| xi_cmp_favorites                           |       0.03 |
| xi_cmp_nagiosbpi_backups                   |       1.50 |
| xi_cmp_scheduledreports_log                |       0.02 |
| xi_cmp_trapdata                            |       0.50 |
| xi_cmp_trapdata_log                        |       0.03 |
| xi_commands                                |       0.02 |
| xi_deploy_agents                           |       0.02 |
| xi_deploy_jobs                             |       0.02 |
| xi_eventqueue                              |       0.03 |
| xi_events                                  |    1209.45 |
| xi_incidents                               |       0.02 |
| xi_meta                                    |   20400.00 |
| xi_mibs                                    |       0.05 |
| xi_options                                 |       0.06 |
| xi_sessions                                |       0.03 |
| xi_sysstat                                 |       0.03 |
| xi_usermeta                                |       0.17 |
| xi_users                                   |       0.03 |
+--------------------------------------------+------------+
nagios_notifications does indeed appear to be missing, and the files are gone too:

Code: Select all

[root@******** ~]# ls -lh /var/lib/mysql/nagios | grep notifications
-rw-rw----. 1 mysql mysql 2.6K Oct 12 13:56 nagios_contactnotifications.frm
-rw-rw----. 1 mysql mysql 7.0M Oct 12 13:56 nagios_contactnotifications.ibd
I have no idea how that could have happened.

Those do appear to be the correct numbers for hosts and services, at least.

/var/nagiosramdisk does get full when Nagios isn't processing passive check results fast enough. There are about 4000 checks that are stale right now:

Code: Select all

[nagios@********~]$ du -sh /var/nagiosramdisk/*
16K     /var/nagiosramdisk/host-perfdata
71M     /var/nagiosramdisk/objects.cache
184K    /var/nagiosramdisk/service-perfdata
464M    /var/nagiosramdisk/spool
106M    /var/nagiosramdisk/status.dat
0       /var/nagiosramdisk/tmp
I have a couple of hourly cron jobs to help keep it from getting completely full when things get slow. I figured it was better to have stale checks than run out of space on the filesystem:

Code: Select all

/bin/find /var/nagiosramdisk/spool/perfdata -type f -mmin +60 -delete
/bin/find /var/nagiosramdisk/spool/checkresults -type f -mmin +60 -delete

Re: NDO-3 problem

Posted: Wed Oct 13, 2021 9:28 am
by ssax
You're hitting a bug:
- These are temp tables that should be cleaned out automatically

Code: Select all

| xi_eventqueue                              |       0.03 |
| xi_events                                  |    1209.45 |
| xi_meta                                    |   20400.00 |
Please run these commands to fix it and it should resolve the issue:

Code: Select all

echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -uroot -pnagiosxi nagiosxi
systemctl restart mariadb nagios httpd crond

Re: NDO-3 problem

Posted: Wed Oct 13, 2021 4:30 pm
by cbeattie-unitrends
That seems to have allowed Nagios to deal with most of the stale service checks, but I'm still seeing a lot of

Code: Select all

[1634160563] NDO-3: ndo_return = 1 (Statement not prepared)
[1634160563] NDO-3: ndo_handle_notification(ndo-handlers.c:1264): Unable to bind paramete
in nagios.log.

Also, how should I re-create the missing nagios_notifications table?

Re: NDO-3 problem

Posted: Wed Oct 13, 2021 5:10 pm
by ssax
Please run these commands to recreate the table and validate after:

Code: Select all

mysql -uroot -pnagiosxi nagios -e "CREATE TABLE nagios_notifications (notification_id int(11) NOT NULL AUTO_INCREMENT, instance_id smallint(6) NOT NULL DEFAULT '0', notification_type smallint(6) NOT NULL DEFAULT '0', notification_reason smallint(6) NOT NULL DEFAULT '0', object_id int(11) NOT NULL DEFAULT '0', start_time datetime NOT NULL DEFAULT '1970-01-01 00:00:01', start_time_usec int(11) NOT NULL DEFAULT '0', end_time datetime NOT NULL DEFAULT '1970-01-01 00:00:01', end_time_usec int(11) NOT NULL DEFAULT '0', state smallint(6) NOT NULL DEFAULT '0', output text NOT NULL, long_output text NOT NULL, escalated smallint(6) NOT NULL DEFAULT '0', contacts_notified smallint(6) NOT NULL DEFAULT '0', PRIMARY KEY (notification_id), UNIQUE KEY instance_id (instance_id,object_id,start_time,start_time_usec), KEY start_time (start_time), KEY object_id (object_id) ) ENGINE=MyISAM AUTO_INCREMENT=17635 DEFAULT CHARSET=utf8 COMMENT='Historical record of host and service notifications'"
systemctl restart nagios