Page 1 of 1

Service checks not performed

Posted: Thu Feb 27, 2025 2:16 pm
by Nagiusz
Greetings!

We have a problem with our Nagios instance. At the moment it has 18987 hosts and 73504 services (BTW, works smoothly thanks to Gearman). Everything is OK unless we try to add some hosts more. After applying configuration service checks are not made anymore (host checks are made normally). Nagios log is full of items like those ones:

NDO-3: The following query failed while MySQL appears to be connected:
NDO-3: Errno was 1062; message was Duplicate entry '104589' for key 'nagios_servicestatus.object_id'


We have MariaDB/InnoDB as an underlying database software. Have we reached some database limit here? Of course active checks are enabled and manually forcing the service check also fails.

Any ideas REALLY appreciated :-)

Re: Service checks not performed

Posted: Fri Feb 28, 2025 1:54 am
by Nagiusz
I think I have found it:

ALTER TABLE nagios_servicestatus MODIFY servicestatus_id BIGINT AUTO_INCREMENT;


on "nagios" database.

Re: Service checks not performed

Posted: Fri Feb 28, 2025 11:39 am
by DoubleDoubleA
Hi @Nagiusz,

I am taking a look at this. The forum these days is staffed by the Nagios devs, this might be a good one for a support ticket if you have one available, the support team can do more thorough troubleshooting.

In any case, the error seems to be that we're trying to make a new row in a table with a primary key that already exists. I'm not sure that you have hit any inherent limits of any of the software components, though we do know that at some point any one Nagios server can have more checks than the hardware can keep up with, even with Mod Gearman in place.

I'm not sure your sql command will address the issue. That table should have as many rows as you have services to check, so in your case 73504 rows. The type on the serviestatus_id is already int, which, even signed, will get you to 2,147,483,647 rows, and it is (or should be) auto_increment already. The issue isn't that you've run out of available rows.

There is some collision on at least one servicestatus_id, which is what the original error is about. Your system is trying to add a new row with using an already-used primary key. The error is pointing to 104589, which is of course larger than your 73504 service count, but servicestatus_id is meant to be unique so whatever services have been deleted or made "inactive" account for the difference between (at least) 104589 and 73504.

Why is it doing this? I'm not sure yet. But I will keep looking into it.

What XI version are you on?

Aaron

Re: Service checks not performed

Posted: Fri Feb 28, 2025 2:48 pm
by DoubleDoubleA
Talking with some more people here it looks like a database corruption happened somehow. You're going to want to clear out the NDO tables entirely, which is fine since they hold ephemeral data anyway.

Code: Select all

echo 'truncate nagios_hoststatus; truncate nagios_hosts; truncate nagios_services; truncate nagios_servicestatus; truncate nagios_servicechecks; truncate nagios_hostchecks; truncate nagios_downtimehistory; truncate nagios_commenthistory;' | mysql -u root -p<yourMysqlPassword> nagios
Then run

Code: Select all

/usr/local/nagiosxi/scripts/repair_databases.sh
You should be back up and running.