Page 1 of 1

Re: [Nagios-devel] [RFC] - NDOUtils duplicate lines problem

Posted: Mon Nov 16, 2009 9:57 pm
by Guest
Hi,

Hendrik Baecker wrote:
> a few days ago Øyvind Nordang, Julien Mathis and Michael Friedrich,
> discussed about duplicate rows in the servicecheck table.
> Michael dropped me a few lines off-list showing up the problem exists
> in some more tables.
I wrote that here too but noone replied on that. I did a little research
where the faulty patch was coming from but I could only see the initial
commit by Ethan on that.
>
> The underlying problem:
> MySQL queries like
> "insert into... on duplicate key update..."
> needs primary and/or unique keys to be hit by the query to trigger the
> UPDATE instead of the INSERT.
> Most of the relevant keys for those queries are actually primary nor
> unique keys, they are just KEYs.
Jep. Just as the MySQL manual points out.
>
> So we have(98%) / might have (2%) duplicate entries in the following
> tables:
>
> nagios_servicechecks
> nagios_systemcommands
> nagios_timedeventqueue
> nagios_timedevents
Those users who are using the 2 years old 1.4b7 are not affected - the
commit was after that release. So if they read this, do not upgrade and
wait for the next version (or do not apply the unique key changes).
>
> Now to the problem from the point of development: I see no good
> solution to handle this.
Well regarding the fact that NDOUtils doesn't seem to come out of beta,
you might just change the db scheme using the unique keys again.
>
> One part of the solution is to change the keys to define the mandatory
> UNIQUE keys, but this can't be done cleanly on existing data since you
> can't define some UNIQUE keys on rows that would be duplicates.
>
> Solutions to be discussed:
>
> 1. Big-Bang-Solution
> Define the user have to truncate above tables to be able to set the
> right keys.
> This will result in data loss.
You might be able to write a query which extracts duplicated rows on the
unique constraints you want to redefine (group by, order by) - and then
redo the 2x on duplicate key insert or update statements. But this is
only a solution for people who need exactly those tables for reporting.
But if you want to learn procedures this would be the right place for
starting correcting this *pardon* big fuckup.
>
> 2.
> It might be possible to write some nifty upgrade scripts to dump the
> data out of the database, set the unique keys and try to re-import the
> data with some fault tolerance.
> Anyone out there with some experience on this level?
> Contra:
> Depending on the amount of data, such a job might be: a long runner
> (time); complicated; not realy error resistant...
I would not recommend that on a production server since the table will
be locked then and ndo2db blocks and Nagios becomes a dead daemon from
the blocking neb module.
>
> What thinks this community about the cost-benefit ratio?
Regarding the blocking NEB module i would recommend data backup and
truncate table.
>
> IMO:
> nagios_timedevent* table are not so important and data loss is irrelevant.
> nagios_systemcommands might be just more interesting for someone
> but nagios_servicechecks seems to me like a treasure of gold, data
> loss in here might be realy painful
systemcommands are issued e.g. if perfdata is enabled. servicechecks
need to be re-organised in case. but using 2 backups - one to work on
and the other as is.

>
> Any comments?
> May be someone is able to adress some voodoo data repair query/script
> to eliminate those duplicates before key changing?
Some kind of the following script (best practical would be a perl script)

0. create a new table after servicechecks scheme but apply the drop
key/add unique key statement
1. query to get all the data: select * from icinga_servicechecks order
by instance_id, service_object_id, start_time, start_time_usec;
2. step through and fetch odd and even results, check if constraints
from above are the same. if not continue;
3. all data from the first row should be kept, only the ones to be
updated are:
check_type, current_check_attempt, max_check_attempts, state,
state_type, start_time, start_time_usec, end_time, end_time_usec,
timeout='%d', early_timeout, execut

...[email truncated]...


This post was automatically imported from historical nagios-devel mailing list archives
Original poster: [email protected]