Checks stop running randomly

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Checks stop running randomly

Post by mguthrie »

That would make my day if you guys found it ; )
User avatar
KevinD
Posts: 26
Joined: Thu Mar 29, 2012 10:26 am

Re: Checks stop running randomly

Post by KevinD »

ok... so I found what may be an issue in the upgrade.

the NDO DB upgrade would never have been able to run.

in subcomponents/ndoutils/ndoutils-1.5.1/db/post-upgrade line 20

Code: Select all

/"$pkgname"/db/upgradedb -u ndoutils -p "$ndopass" -h "$ndohost" -d nagios
the ndoutils user is hardcoded...
We have changed our default user / pass for pretty much everything.
This is normally taken from basedir/html/config.inc.php which has an entry for all DB's
* side note - same script also gets pass/host info from /usr/local/nagios/etc/ndo2db.cfg rather than the config.inc.php as well.

to make matters worse, the logic in the script makes it impossible for me to know if any upgrade would have been done in the first place.

Code: Select all

eval {$dbh->do("SELECT * FROM nagios_dbversion LIMIT 1") };
if ($@) {
  print "*** Creating table nagios_dbversion",$/;
  $dbh->do("CREATE TABLE nagios_dbversion (name VARCHAR(10) NOT NULL, version VARCHAR(10) NOT NULL);");
};
Line 48 checks to see if table nagios_dbversion exists (if its not there it creates it)

Code: Select all

 54 # Get current database version
 55 my $version;
 56 my $thisversion="1.5.1";
 57 my $lastversion="1.5.1";
 58 my $legacyversion="1.5.1";
 59 
 60 $version = $dbh->selectrow_array("SELECT version FROM nagios_dbversion WHERE name='ndoutils'");
 61 if ($version eq "") {
 62   # Assume last legacy release (didn't have version table)
 63   print "*** Assuming version $legacyversion of nodutils installed",$/;
 64   $dbh->do("INSERT nagios_dbversion SET name='ndoutils', version='$legacyversion';");
 65   $version = $legacyversion
 66 };
So we try and determine the version that was there from a while ago, and if we can't, we set it to 1.5.1.

Code: Select all

 68 print "Current database version: $version",$/;
 69 
 70 if ($version eq $thisversion){
 71     print "Database already upgraded.",$/;
 72     exit 0;
 73 }
so with this... if the table didn't exist, or the version was not there, we exit...

Code: Select all

mysql> select * from nagios_dbversion;
Empty set (0.00 sec)

mysql> desc nagios_dbversion
    -> ;
+---------+-------------+------+-----+---------+-------+
| Field   | Type        | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+-------+
| name    | varchar(10) | NO   |     |         |       | 
| version | varchar(10) | NO   |     |         |       | 
+---------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
Our current table has no data in it... so I have no idea what it would have run for the upgrade.

It should have been trying to run one of the following

Code: Select all

[kdandrid@sidhqmonm0 db]$ ll mysql-upgrade-*.sql
-rwxrwxrwx 1 root root  5127 Oct 31  2007 mysql-upgrade-1.4b6.sql
-rwxrwxrwx 1 root root  5127 Oct 31  2007 mysql-upgrade-1.4b5.sql
-rwxrwxrwx 1 root root 11279 Oct 31  2007 mysql-upgrade-1.4b4.sql
-rwxrwxrwx 1 root root 11279 Oct 31  2007 mysql-upgrade-1.4b3.sql
-rwxrwxrwx 1 root root 11368 Oct 31  2007 mysql-upgrade-1.4b2.sql
-rwxrwxrwx 1 root root 12246 Oct 31  2007 mysql-upgrade-1.4b1.sql
-rwxrwxrwx 1 root root 24546 Oct 31  2007 mysql-upgrade-1.3.sql
-rwxrwxrwx 1 root root  1082 Jan  5  2010 mysql-upgrade-1.4b8.sql
[kdandrid@sidhqmonm0 db]$
But I have no way of knowing which one.
Little help please?
User avatar
KevinD
Posts: 26
Joined: Thu Mar 29, 2012 10:26 am

Re: Checks stop running randomly

Post by KevinD »

I can also verify that DB is missing changes that would have been in 1.4b6 & 1.4b5.

So we obviously have discrepancies...
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Checks stop running randomly

Post by mguthrie »

Ok, so lets re-run the upgrade process for ndoutils. Access the /tmp/nagiosxi/subcomponents/ndoutils directory:

Update the post-upgrade script to the following:

Code: Select all

#!/bin/sh -e

pkgname="$1"

echo "NDOUTILS POST-UPGRADE..."
##parse values in case mysql is offloaded 
ndopass=$(sed -n '/^db_pass=/ s///p' /usr/local/nagios/etc/ndo2db.cfg)
ndohost=$(sed -n '/^db_host=/ s///p' /usr/local/nagios/etc/ndo2db.cfg)
ndouser=$(sed -n '/^db_user=/ s///p' /usr/local/nagios/etc/ndo2db.cfg)
# Post-install modifications

# New init file
cp -f mods/ndo2db.init /etc/init.d/ndo2db

# Change some settings in /etc/sysctl.conf
sed -i -e '/^kernel\.msgmnb/ s/.*/kernel.msgmnb = 131072000/' \
	-e '/^kernel\.msgmax/ s/.*/kernel.msgmax = 131072000/' /etc/sysctl.conf

# Upgrade the database
./"$pkgname"/db/upgradedb -u "$ndouser" -p "$ndopass" -h "$ndohost" -d nagios

# Restart ndo2db daemon
service ndo2db restart

# Restart Nagios Core (to use new ndomod module)
service nagios restart
Then run the following script:

Code: Select all

cd /tmp/nagiosxi/subcomponents/ndoutils
./upgrade
User avatar
KevinD
Posts: 26
Joined: Thu Mar 29, 2012 10:26 am

Re: Checks stop running randomly

Post by KevinD »

This is where I'm stuck... that won't do anything.

The script were calling will simply insert 1.5.1 into the DB table and exit.

I'm doing diffs of the different sql files going backwards to see which one we seem to have all of the changes in... but this is long and arduous...

I would LOVE to know if you know of a faster way to determine which version of the schema we are running.
User avatar
KevinD
Posts: 26
Joined: Thu Mar 29, 2012 10:26 am

Re: Checks stop running randomly

Post by KevinD »

ok... I think i have it... Please let me know if this is right.
the SQL files are inclusive of each other going backwards, thus, b3 -> current would have everything that b4 -> current would as well.

So doing a diff of an sql with the previous version will tell me what was added to the older version.

Code: Select all

diff mysql-upgrade-1.4b2.sql mysql-upgrade-1.4b4.sql
1,3d0
< ALTER TABLE `nagios_hosts` ADD `alias` VARCHAR( 64 ) NOT NULL AFTER `host_object_id` ;
This exists in the DB... so we are ATLEAST current up to 1.4b2 ( there were no changes between 4b2 and 4b3)

the differences between 4b4 and 4b5 look like they are there... with one exception.
It looks like all tables were altered to use innodb, while mine are still showing MyISAM

there are no differences between 4b5 and 4b6
and there are numerous changes in 4b6 -> 4b8 that are not in my DB (most of which appear to be around changing varchar to TEXT.


Please advise on which to run
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Checks stop running randomly

Post by mguthrie »

I would go by the following scripts:

/tmp/nagiosxi/subcomponents/ndoutils/mods/mysql-mods-1.4b8.sql

/usr/local/nagiosxi/scripts/patch_ndoutils.php


Now here's the other kicker to all of this. In talking with Ethan about this issue, if the last_check/next_check times are getting stuck in status.dat and the Core interface, then the issue is most likely in the Core engine, not ndoutils. However, lets make sure your ndoutils is upgraded properly, then we'll dive into this some more.

Can you verify if the XI and the Core interface are both showing a "stuck" check for the same services?
User avatar
KevinD
Posts: 26
Joined: Thu Mar 29, 2012 10:26 am

Re: Checks stop running randomly

Post by KevinD »

I can verify that the last_check and next_check times are the same in core and XI.

Ill update the DB and see what that does.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Checks stop running randomly

Post by mguthrie »

Ok, I'm working on setting up a test with your show_frozen script on a test box with an offloaded DB to see if I can reproduce what you guys have.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: Checks stop running randomly

Post by mguthrie »

Grr, no luck on the local test box. Can I private message you a link for a remote session?
Locked