NDO-3 problem

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
cbeattie-unitrends
Posts: 84
Joined: Mon Oct 10, 2016 2:51 pm

NDO-3 problem

Post by cbeattie-unitrends »

Hello,

Recently I noticed one of my Nagios servers with a lot of stale passive check results. Everything had been running well as far as I know. I have offloaded the databases onto an external server, so the CPU load does not seem too high on either Nagios or the database server. Looking in /usr/local/nagios/var/nagios.log I see lines like this, sometimes interspersed with the normal log entries and other times whole blocks of them:

Code: Select all

[1633357264] NDO-3: ndo_return = 1 (Statement not prepared)
[1633357264] NDO-3: ndo_handle_notification(ndo-handlers.c:1264): Unable to bind parameters
Nagios is up to date and the database tables are all InnoDB.

Thanks.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: NDO-3 problem

Post by benjaminsmith »

Hi,

We believe one of the tables did not get properly updated during the upgrade. Please run the following command to dump the database data, and we'll take a closer look.

Code: Select all

mysqldump --no-data --database nagios -u username -phassword -h xxx.xxx.xxx.xxx >nagios.sql
Adjust the username, password, xxx.xxx.xxx.xxx

Thanks,
Benjamin
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
cbeattie-unitrends
Posts: 84
Joined: Mon Oct 10, 2016 2:51 pm

Re: NDO-3 problem

Post by cbeattie-unitrends »

Hello,

I wasn't sure if that file contains anything secret, so I sent it to you in a PM.

Thank you.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: NDO-3 problem

Post by ssax »

It would contain sensitive info.

Please PM me a copy of your profile.zip so I can review your logs/settings, you can download it from Admin > System Profile by clicking the Download Profile button.

What is the output of this command?

Code: Select all

strings /usr/local/nagios/bin/ndo.so | grep Copyright
cbeattie-unitrends
Posts: 84
Joined: Mon Oct 10, 2016 2:51 pm

Re: NDO-3 problem

Post by cbeattie-unitrends »

Hello,
I sent the profile via PM. Here's the output of the command:

Code: Select all

[root@den-nagios certs]# strings /usr/local/nagios/bin/ndo.so | grep Copyright
NDO 3.0.7 (c) Copyright 2009-2020 Nagios - Nagios Core Development Team
Thanks.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: NDO-3 problem

Post by ssax »

I do not see a nagios_notifications table in your nagios dump file. Do you have crashed tables?

Please send the output of this command:
- NOTE: You may need to adjust the -h 127.0.0.1, the -uroot, and -pnagiosxi in the command if your DB is offloaded to another server and/or you've changed the root mysql password

Code: Select all

echo "SELECT table_name AS 'Table', round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB' FROM information_schema.TABLES WHERE table_schema IN ('nagios', 'nagiosql', 'nagiosxi');" | mysql -h 127.0.0.1 -uroot -pnagiosxi --table
Is this the proper amount of hosts/services?

Code: Select all

Total Hosts: 2379
Total Services: 72696
This is almost full as well:

Code: Select all

tmpfs                1.0G  989M   36M  97% /var/nagiosramdisk
What is the output of these commands:

Code: Select all

du -sh /var/nagiosramdisk/*
cbeattie-unitrends
Posts: 84
Joined: Mon Oct 10, 2016 2:51 pm

Re: NDO-3 problem

Post by cbeattie-unitrends »

I don't think I have any crashed tables. I looked at /var/log/mariadb/mariadb.log and although grepping for crash didn't return anything, the other contents weren't what I expected. The pattern seems to be that the warning gets repeated a bunch of times, then the monitor output.

Code: Select all

2021-10-12 13:47:54 7365508 [Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. innodb_buffer_pool_size=128M. Starting the InnoDB Monitor to print diagnostics.

=====================================
2021-10-12 13:48:04 0x7fe2f6546700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 47 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 936964 srv_active, 0 srv_shutdown, 179 srv_idle
srv_master_thread log flush and writes: 937141
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 2011986257
OS WAIT ARRAY INFO: signal count 3328546024
RW-shared spins 24329468616, rounds 167038144395, OS waits 655528858
RW-excl spins 917673661, rounds 9405051947, OS waits 105404570
RW-sx spins 29169304, rounds 513076929, OS waits 8575783
Spin rounds per wait: 6.87 RW-shared, 10.25 RW-excl, 17.59 RW-sx
FAIL TO OBTAIN LOCK MUTEX, SKIP LOCK INFO PRINTING
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
55271980514 OS file reads, 1088887784 OS file writes, 341751569 OS fsyncs
1 pending reads, 0 pending writes
58595.94 reads/s, 16383 avg bytes/read, 648.88 writes/s, 345.04 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 8, free list len 2426, seg size 2435, 19612541 merges
merged operations:
 insert 83963947, delete mark 299194323, delete 7328246
discarded operations:
 insert 51392, delete mark 1879, delete 76
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 16 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 2 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 8 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 3 buffer(s)
322536.58 hash searches/s, 76575.65 non-hash searches/s
---
LOG
---
Log sequence number 23336846891287
Log flushed up to   23336846891287
Pages flushed up to 23336846546811
Last checkpoint at  23336844674792
0 pending log flushes, 0 pending chkp writes
230604756 log i/o's done, 288.43 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 170590208
Dictionary memory allocated 457776
Buffer pool size   8192
Free buffers       0
Database pages     2239
Old database pages 835
Modified db pages  134
Percent of dirty pages(LRU & free pages): 5.982
Max dirty pages percent: 75.000
Pending reads 2
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 9451354, not young 1264073857478
11.43 youngs/s, 2066569.39 non-youngs/s
Pages read 55271620385, created 468481754, written 806957859
58596.52 reads/s, 9.70 creates/s, 336.16 writes/s
Buffer pool hit rate 976 / 1000, young-making rate 0 / 1000 not 843 / 1000
Pages read ahead 920.92/s, evicted without access 857.71/s, Random read ahead 0.00/s
LRU len: 2239, unzip_LRU len: 0
I/O sum[4582394]:cur[7981], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
14 read views open inside InnoDB
Process ID=1434, Main thread ID=140612746884864, state: sleeping
Number of rows inserted 93580553, updated 99124466, deleted 78924030, read 1287409090672
132.12 inserts/s, 117.34 updates/s, 10.36 deletes/s, 2028622.73 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
Here's the output from the first command:

Code: Select all

+--------------------------------------------+------------+
| Table                                      | Size in MB |
+--------------------------------------------+------------+
| nagios_acknowledgements                    |       0.44 |
| nagios_commands                            |       0.06 |
| nagios_commenthistory                      |    1091.58 |
| nagios_comments                            |       3.67 |
| nagios_configfiles                         |       0.03 |
| nagios_configfilevariables                 |       0.02 |
| nagios_conninfo                            |       0.02 |
| nagios_contact_addresses                   |       0.03 |
| nagios_contact_notificationcommands        |       0.03 |
| nagios_contactgroup_members                |       0.03 |
| nagios_contactgroups                       |       0.03 |
| nagios_contactnotificationmethods          |       1.91 |
| nagios_contactnotifications                |       1.05 |
| nagios_contacts                            |       0.03 |
| nagios_contactstatus                       |       0.03 |
| nagios_customvariables                     |       0.63 |
| nagios_customvariablestatus                |       0.63 |
| nagios_dbversion                           |       0.02 |
| nagios_downtimehistory                     |      20.03 |
| nagios_eventhandlers                       |       4.30 |
| nagios_externalcommands                    |       0.05 |
| nagios_flappinghistory                     |     450.88 |
| nagios_host_contactgroups                  |       0.17 |
| nagios_host_contacts                       |       0.03 |
| nagios_host_parenthosts                    |       0.03 |
| nagios_hostchecks                          |       1.67 |
| nagios_hostdependencies                    |       0.03 |
| nagios_hostescalation_contactgroups        |       0.30 |
| nagios_hostescalation_contacts             |       0.16 |
| nagios_hostescalations                     |       0.42 |
| nagios_hostgroup_members                   |       0.17 |
| nagios_hostgroups                          |       0.03 |
| nagios_hosts                               |       1.63 |
| nagios_hoststatus                          |       2.48 |
| nagios_instances                           |       0.02 |
| nagios_objects                             |      30.61 |
| nagios_processevents                       |       0.38 |
| nagios_programstatus                       |       0.03 |
| nagios_runtimevariables                    |       0.03 |
| nagios_scheduleddowntime                   |       0.03 |
| nagios_service_contactgroups               |       3.03 |
| nagios_service_contacts                    |       4.03 |
| nagios_service_parentservices              |       0.03 |
| nagios_servicechecks                       |      13.06 |
| nagios_servicedependencies                 |       0.03 |
| nagios_serviceescalation_contactgroups     |       3.03 |
| nagios_serviceescalation_contacts          |       0.03 |
| nagios_serviceescalations                  |       4.03 |
| nagios_servicegroup_members                |       0.27 |
| nagios_servicegroups                       |       0.03 |
| nagios_services                            |      24.58 |
| nagios_servicestatus                       |      66.59 |
| nagios_statehistory                        |     395.83 |
| nagios_systemcommands                      |       0.05 |
| nagios_timedeventqueue                     |       0.09 |
| nagios_timedevents                         |       0.09 |
| nagios_timeperiod_timeranges               |       0.03 |
| nagios_timeperiods                         |       0.03 |
| tbl_command                                |       0.06 |
| tbl_contact                                |       0.03 |
| tbl_contactgroup                           |       0.03 |
| tbl_contacttemplate                        |       0.03 |
| tbl_domain                                 |       0.03 |
| tbl_host                                   |       0.50 |
| tbl_hostdependency                         |       0.03 |
| tbl_hostescalation                         |       0.03 |
| tbl_hostextinfo                            |       0.03 |
| tbl_hostgroup                              |       0.03 |
| tbl_hosttemplate                           |       0.03 |
| tbl_info                                   |       0.17 |
| tbl_lnkContactToCommandHost                |       0.02 |
| tbl_lnkContactToCommandService             |       0.02 |
| tbl_lnkContactToContactgroup               |       0.02 |
| tbl_lnkContactToContacttemplate            |       0.02 |
| tbl_lnkContactToVariabledefinition         |       0.02 |
| tbl_lnkContactgroupToContact               |       0.02 |
| tbl_lnkContactgroupToContactgroup          |       0.02 |
| tbl_lnkContacttemplateToCommandHost        |       0.02 |
| tbl_lnkContacttemplateToCommandService     |       0.02 |
| tbl_lnkContacttemplateToContactgroup       |       0.02 |
| tbl_lnkContacttemplateToContacttemplate    |       0.02 |
| tbl_lnkContacttemplateToVariabledefinition |       0.02 |
| tbl_lnkHostToContact                       |       0.02 |
| tbl_lnkHostToContactgroup                  |       0.02 |
| tbl_lnkHostToHost                          |       0.02 |
| tbl_lnkHostToHostgroup                     |       0.09 |
| tbl_lnkHostToHosttemplate                  |       0.11 |
| tbl_lnkHostToVariabledefinition            |       0.02 |
| tbl_lnkHostdependencyToHost_DH             |       0.02 |
| tbl_lnkHostdependencyToHost_H              |       0.02 |
| tbl_lnkHostdependencyToHostgroup_DH        |       0.02 |
| tbl_lnkHostdependencyToHostgroup_H         |       0.02 |
| tbl_lnkHostescalationToContact             |       0.02 |
| tbl_lnkHostescalationToContactgroup        |       0.02 |
| tbl_lnkHostescalationToHost                |       0.02 |
| tbl_lnkHostescalationToHostgroup           |       0.02 |
| tbl_lnkHostgroupToHost                     |       0.02 |
| tbl_lnkHostgroupToHostgroup                |       0.02 |
| tbl_lnkHosttemplateToContact               |       0.02 |
| tbl_lnkHosttemplateToContactgroup          |       0.02 |
| tbl_lnkHosttemplateToHost                  |       0.02 |
| tbl_lnkHosttemplateToHostgroup             |       0.02 |
| tbl_lnkHosttemplateToHosttemplate          |       0.02 |
| tbl_lnkHosttemplateToVariabledefinition    |       0.02 |
| tbl_lnkServiceToContact                    |       0.02 |
| tbl_lnkServiceToContactgroup               |       0.02 |
| tbl_lnkServiceToHost                       |       0.05 |
| tbl_lnkServiceToHostgroup                  |       0.02 |
| tbl_lnkServiceToServicegroup               |       0.02 |
| tbl_lnkServiceToServicetemplate            |       0.06 |
| tbl_lnkServiceToVariabledefinition         |       0.02 |
| tbl_lnkServicedependencyToHost_DH          |       0.02 |
| tbl_lnkServicedependencyToHost_H           |       0.02 |
| tbl_lnkServicedependencyToHostgroup_DH     |       0.02 |
| tbl_lnkServicedependencyToHostgroup_H      |       0.02 |
| tbl_lnkServicedependencyToService_DS       |       0.02 |
| tbl_lnkServicedependencyToService_S        |       0.02 |
| tbl_lnkServicedependencyToServicegroup_DS  |       0.02 |
| tbl_lnkServicedependencyToServicegroup_S   |       0.02 |
| tbl_lnkServiceescalationToContact          |       0.02 |
| tbl_lnkServiceescalationToContactgroup     |       0.02 |
| tbl_lnkServiceescalationToHost             |       0.02 |
| tbl_lnkServiceescalationToHostgroup        |       0.02 |
| tbl_lnkServiceescalationToService          |       0.02 |
| tbl_lnkServiceescalationToServicegroup     |       0.02 |
| tbl_lnkServicegroupToService               |       0.02 |
| tbl_lnkServicegroupToServicegroup          |       0.02 |
| tbl_lnkServicetemplateToContact            |       0.02 |
| tbl_lnkServicetemplateToContactgroup       |       0.02 |
| tbl_lnkServicetemplateToHost               |       0.02 |
| tbl_lnkServicetemplateToHostgroup          |       0.02 |
| tbl_lnkServicetemplateToServicegroup       |       0.02 |
| tbl_lnkServicetemplateToServicetemplate    |       0.02 |
| tbl_lnkServicetemplateToVariabledefinition |       0.02 |
| tbl_lnkTimeperiodToTimeperiod              |       0.02 |
| tbl_logbook                                |       0.02 |
| tbl_mainmenu                               |       0.02 |
| tbl_permission                             |       0.02 |
| tbl_permission_inactive                    |       0.02 |
| tbl_service                                |       0.13 |
| tbl_servicedependency                      |       0.03 |
| tbl_serviceescalation                      |       0.03 |
| tbl_serviceextinfo                         |       0.03 |
| tbl_servicegroup                           |       0.03 |
| tbl_servicetemplate                        |       0.03 |
| tbl_session                                |       0.02 |
| tbl_session_locks                          |       0.02 |
| tbl_settings                               |       0.03 |
| tbl_submenu                                |       0.02 |
| tbl_timedefinition                         |       0.02 |
| tbl_timeperiod                             |       0.03 |
| tbl_user                                   |       0.03 |
| tbl_variabledefinition                     |       0.02 |
| xi_auditlog                                |     162.72 |
| xi_auth_tokens                             |       0.03 |
| xi_cmp_ccm_backups                         |       0.02 |
| xi_cmp_favorites                           |       0.03 |
| xi_cmp_nagiosbpi_backups                   |       1.50 |
| xi_cmp_scheduledreports_log                |       0.02 |
| xi_cmp_trapdata                            |       0.50 |
| xi_cmp_trapdata_log                        |       0.03 |
| xi_commands                                |       0.02 |
| xi_deploy_agents                           |       0.02 |
| xi_deploy_jobs                             |       0.02 |
| xi_eventqueue                              |       0.03 |
| xi_events                                  |    1209.45 |
| xi_incidents                               |       0.02 |
| xi_meta                                    |   20400.00 |
| xi_mibs                                    |       0.05 |
| xi_options                                 |       0.06 |
| xi_sessions                                |       0.03 |
| xi_sysstat                                 |       0.03 |
| xi_usermeta                                |       0.17 |
| xi_users                                   |       0.03 |
+--------------------------------------------+------------+
nagios_notifications does indeed appear to be missing, and the files are gone too:

Code: Select all

[root@******** ~]# ls -lh /var/lib/mysql/nagios | grep notifications
-rw-rw----. 1 mysql mysql 2.6K Oct 12 13:56 nagios_contactnotifications.frm
-rw-rw----. 1 mysql mysql 7.0M Oct 12 13:56 nagios_contactnotifications.ibd
I have no idea how that could have happened.

Those do appear to be the correct numbers for hosts and services, at least.

/var/nagiosramdisk does get full when Nagios isn't processing passive check results fast enough. There are about 4000 checks that are stale right now:

Code: Select all

[nagios@********~]$ du -sh /var/nagiosramdisk/*
16K     /var/nagiosramdisk/host-perfdata
71M     /var/nagiosramdisk/objects.cache
184K    /var/nagiosramdisk/service-perfdata
464M    /var/nagiosramdisk/spool
106M    /var/nagiosramdisk/status.dat
0       /var/nagiosramdisk/tmp
I have a couple of hourly cron jobs to help keep it from getting completely full when things get slow. I figured it was better to have stale checks than run out of space on the filesystem:

Code: Select all

/bin/find /var/nagiosramdisk/spool/perfdata -type f -mmin +60 -delete
/bin/find /var/nagiosramdisk/spool/checkresults -type f -mmin +60 -delete
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: NDO-3 problem

Post by ssax »

You're hitting a bug:
- These are temp tables that should be cleaned out automatically

Code: Select all

| xi_eventqueue                              |       0.03 |
| xi_events                                  |    1209.45 |
| xi_meta                                    |   20400.00 |
Please run these commands to fix it and it should resolve the issue:

Code: Select all

echo "truncate table xi_events; truncate table xi_meta; truncate table xi_eventqueue;" | mysql -uroot -pnagiosxi nagiosxi
systemctl restart mariadb nagios httpd crond
cbeattie-unitrends
Posts: 84
Joined: Mon Oct 10, 2016 2:51 pm

Re: NDO-3 problem

Post by cbeattie-unitrends »

That seems to have allowed Nagios to deal with most of the stale service checks, but I'm still seeing a lot of

Code: Select all

[1634160563] NDO-3: ndo_return = 1 (Statement not prepared)
[1634160563] NDO-3: ndo_handle_notification(ndo-handlers.c:1264): Unable to bind paramete
in nagios.log.

Also, how should I re-create the missing nagios_notifications table?
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: NDO-3 problem

Post by ssax »

Please run these commands to recreate the table and validate after:

Code: Select all

mysql -uroot -pnagiosxi nagios -e "CREATE TABLE nagios_notifications (notification_id int(11) NOT NULL AUTO_INCREMENT, instance_id smallint(6) NOT NULL DEFAULT '0', notification_type smallint(6) NOT NULL DEFAULT '0', notification_reason smallint(6) NOT NULL DEFAULT '0', object_id int(11) NOT NULL DEFAULT '0', start_time datetime NOT NULL DEFAULT '1970-01-01 00:00:01', start_time_usec int(11) NOT NULL DEFAULT '0', end_time datetime NOT NULL DEFAULT '1970-01-01 00:00:01', end_time_usec int(11) NOT NULL DEFAULT '0', state smallint(6) NOT NULL DEFAULT '0', output text NOT NULL, long_output text NOT NULL, escalated smallint(6) NOT NULL DEFAULT '0', contacts_notified smallint(6) NOT NULL DEFAULT '0', PRIMARY KEY (notification_id), UNIQUE KEY instance_id (instance_id,object_id,start_time,start_time_usec), KEY start_time (start_time), KEY object_id (object_id) ) ENGINE=MyISAM AUTO_INCREMENT=17635 DEFAULT CHARSET=utf8 COMMENT='Historical record of host and service notifications'"
systemctl restart nagios
Locked