Production Instance of Nagios gone
Re: Production Instance of Nagios gone
Here's CLI reference for all core commands.
http://old.nagios.org/developerinfo/ext ... ndlist.php
You can also access the Core interface in the meantime at http://<yourserver>/nagios.
Let me do some digging on how to repair a postgres database externally. If we need to we'll do a remote session. We'll do our best to get you back up and running ASAP.
http://old.nagios.org/developerinfo/ext ... ndlist.php
You can also access the Core interface in the meantime at http://<yourserver>/nagios.
Let me do some digging on how to repair a postgres database externally. If we need to we'll do a remote session. We'll do our best to get you back up and running ASAP.
Re: Production Instance of Nagios gone
Thank you for all your help.
Re: Production Instance of Nagios gone
Ok, I put together a quick script to re-install postgres, but this will leave your XI data untouched. The issue appears to be with the actual postgres table. Note that this script will download the latest xi-tarball, but it will only run the scripts related to postgresql setup. So if you don't have external access on your server, you'll have to get the latest tarball onto your XI server, and then run the steps in the script. Let me know how it goes.
Make sure to chmod +x the script before running ; )
Make sure to chmod +x the script before running ; )
You do not have the required permissions to view the files attached to this post.
Re: Production Instance of Nagios gone
Hi mguthrie,
I ran the script but ran into the error shown below:
PostgresQL installed OK - continuing...
Initializing PostgresQL...
Starting PostgresQL...
Starting postgresql service: [ OK ]
Copying PostgresQL trust-based authentication configuration...
Restarting PostgresQL...
Stopping postgresql service: [ OK ]
Starting postgresql service: [ OK ]
Checking PostgresQL status...
PostgresQL initialized OK
Checking PostgresQL status...
PostgresQL running - continuing...
Creating role and database...
psql: FATAL: database is not accepting commands to avoid wraparound data loss in database "template1"
HINT: Stop the postmaster and use a standalone backend to vacuum database "template1".
psql: FATAL: database is not accepting commands to avoid wraparound data loss in database "template1"
HINT: Stop the postmaster and use a standalone backend to vacuum database "template1".
psql: FATAL: database is not accepting commands to avoid wraparound data loss in database "template1"
HINT: Stop the postmaster and use a standalone backend to vacuum database "template1".
./init-xidb: line 28: [: -eq: unary operator expected
ERROR: PostgresQL user 'nagiosxi' was not created - exiting.
Thanks.
I ran the script but ran into the error shown below:
PostgresQL installed OK - continuing...
Initializing PostgresQL...
Starting PostgresQL...
Starting postgresql service: [ OK ]
Copying PostgresQL trust-based authentication configuration...
Restarting PostgresQL...
Stopping postgresql service: [ OK ]
Starting postgresql service: [ OK ]
Checking PostgresQL status...
PostgresQL initialized OK
Checking PostgresQL status...
PostgresQL running - continuing...
Creating role and database...
psql: FATAL: database is not accepting commands to avoid wraparound data loss in database "template1"
HINT: Stop the postmaster and use a standalone backend to vacuum database "template1".
psql: FATAL: database is not accepting commands to avoid wraparound data loss in database "template1"
HINT: Stop the postmaster and use a standalone backend to vacuum database "template1".
psql: FATAL: database is not accepting commands to avoid wraparound data loss in database "template1"
HINT: Stop the postmaster and use a standalone backend to vacuum database "template1".
./init-xidb: line 28: [: -eq: unary operator expected
ERROR: PostgresQL user 'nagiosxi' was not created - exiting.
Thanks.
Re: Production Instance of Nagios gone
Ok, so it looks like we will have to do the external vacuum. Here's what I'll have you do.
Ctrl+D to close the SQL prompt
Lets also do that for the template1 table, and if that error show up for any other tables, lets run it for those as well.
Ctrl+D to close the SQL prompt
After that, try re-running that script I sent to make sure everything is installed and set up correctly.
Code: Select all
service postgresql stop
su postgres -c 'postgres -D /var/lib/pgsql/data postgres'
vacuum full analyze verboseLets also do that for the template1 table, and if that error show up for any other tables, lets run it for those as well.
Code: Select all
su postgres -c 'postgres -D /var/lib/pgsql/data template1'
vacuum full analyze verboseCode: Select all
service postgresql startAfter that, try re-running that script I sent to make sure everything is installed and set up correctly.
Re: Production Instance of Nagios gone
Thank you for all your help, NagiosXI is up now.
I ran the script again but failed with the following error because postgres was not running.
Installed:
postgresql.i386 0:8.1.23-1.el5_7.3
postgresql-devel.i386 0:8.1.23-1.el5_7.3
postgresql-server.i386 0:8.1.23-1.el5_7.3
Complete!
--2012-01-03 13:07:38-- http://assets.nagios.com/downloads/nagiosxi/2011/xi-201 1r1.9.tar.gz
Resolving assets.nagios.com... 72.14.181.71
Connecting to assets.nagios.com|72.14.181.71|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 31271162 (30M) [application/x-gzip]
Saving to: `xi-2011r1.9.tar.gz.1'
100%[======================================>] 31,271,162 4.15M/s in 6.9s
2012-01-03 13:07:45 (4.30 MB/s) - `xi-2011r1.9.tar.gz.1' saved [31271162/3127116 2]
PostgresQL already initialized - skipping.
Checking PostgresQL status...
ERROR: PostgresQL not running - exiting.
Thanks again.
I ran the script again but failed with the following error because postgres was not running.
Installed:
postgresql.i386 0:8.1.23-1.el5_7.3
postgresql-devel.i386 0:8.1.23-1.el5_7.3
postgresql-server.i386 0:8.1.23-1.el5_7.3
Complete!
--2012-01-03 13:07:38-- http://assets.nagios.com/downloads/nagiosxi/2011/xi-201 1r1.9.tar.gz
Resolving assets.nagios.com... 72.14.181.71
Connecting to assets.nagios.com|72.14.181.71|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 31271162 (30M) [application/x-gzip]
Saving to: `xi-2011r1.9.tar.gz.1'
100%[======================================>] 31,271,162 4.15M/s in 6.9s
2012-01-03 13:07:45 (4.30 MB/s) - `xi-2011r1.9.tar.gz.1' saved [31271162/3127116 2]
PostgresQL already initialized - skipping.
Checking PostgresQL status...
ERROR: PostgresQL not running - exiting.
Thanks again.
Re: Production Instance of Nagios gone
Hi mguthrie,
I noticed the following once logged on to NagiosXI:
Under XI System Component Status, Database Maintenance has a 'Red Exclamation" symbol.
What does this mean?
Thanks.
I noticed the following once logged on to NagiosXI:
Under XI System Component Status, Database Maintenance has a 'Red Exclamation" symbol.
What does this mean?
Thanks.
Re: Production Instance of Nagios gone
The database maintenance script probably hasn't completed recently because of the postgresql error. It will probably fix itself, but just to be safe, lets do the following.
Code: Select all
psql nagiosxi nagiosxi
vacuum;
vacuum analyze;
vacuum full;
\q
cd /usr/local/nagiosxi/cron
rm -f ../var/dbmaint.lock
./dbmaint.phpRe: Production Instance of Nagios gone
I saw the following errors when running dbmain.php:
OPTIMIZING NAGIOSXI TABLE: xi_commands
SQL: VACUUM ANALYZE xi_commands;
SQL: SQL Error [nagiosxi] :</b> ERROR: tuple concurrently updatedOPTIMIZING NAGIOSXI TABLE: xi_events
SQL: VACUUM ANALYZE xi_events;
SQL: SQL Error [nagiosxi] :</b> ERROR: tuple concurrently updatedOPTIMIZING NAGIOSXI TABLE: xi_notifications
SQL: VACUUM ANALYZE xi_notifications;
SQL: SQL Error [nagiosxi] :</b> ERROR: relation "xi_notifications" does not existOPTIMIZING NAGIOSXI TABLE: xi_meta
SQL: VACUUM ANALYZE xi_meta;
SQL: SQL Error [nagiosxi] :</b> ERROR: tuple concurrently updatedOPTIMIZING NAGIOSXI TABLE: xi_options
SQL: VACUUM ANALYZE xi_options;
OPTIMIZING NAGIOSXI TABLE: xi_sysstat
SQL: VACUUM ANALYZE xi_sysstat;
SQL: SQL Error [nagiosxi] :</b> ERROR: tuple concurrently updatedOPTIMIZING NAGIOSXI TABLE: xi_usermeta
SQL: VACUUM ANALYZE xi_usermeta;
OPTIMIZING NAGIOSXI TABLE: xi_users
SQL: VACUUM ANALYZE xi_users;
CLEANING nagiosql TABLE 'logbook'...
SQL: DELETE FROM tbl_logbook WHERE time < FROM_UNIXTIME(1325590571)
OPTIMIZING NAGIOSQL TABLE: tbl_contact
SQL: OPTIMIZE TABLE tbl_contact
OPTIMIZING NAGIOSQL TABLE: tbl_host
SQL: OPTIMIZE TABLE tbl_host
OPTIMIZING NAGIOSQL TABLE: tbl_lnkHostToHost
SQL: OPTIMIZE TABLE tbl_lnkHostToHost
OPTIMIZING NAGIOSQL TABLE: tbl_lnkHostdependencyToHost_DH
SQL: OPTIMIZE TABLE tbl_lnkHostdependencyToHost_DH
OPTIMIZING NAGIOSQL TABLE: tbl_lnkHostdependencyToHost_H
SQL: OPTIMIZE TABLE tbl_lnkHostdependencyToHost_H
OPTIMIZING NAGIOSQL TABLE: tbl_lnkServiceToHost
SQL: OPTIMIZE TABLE tbl_lnkServiceToHost
OPTIMIZING NAGIOSQL TABLE: tbl_lnkServicedependencyToService_DS
SQL: OPTIMIZE TABLE tbl_lnkServicedependencyToService_DS
OPTIMIZING NAGIOSQL TABLE: tbl_lnkServicedependencyToService_S
SQL: OPTIMIZE TABLE tbl_lnkServicedependencyToService_S
OPTIMIZING NAGIOSQL TABLE: tbl_lnkServiceToHostgroup
SQL: OPTIMIZE TABLE tbl_lnkServiceToHostgroup
OPTIMIZING NAGIOSQL TABLE: tbl_logbook
SQL: OPTIMIZE TABLE tbl_logbook
OPTIMIZING NAGIOSQL TABLE: tbl_service
SQL: OPTIMIZE TABLE tbl_service
OPTIMIZING NAGIOSQL TABLE: tbl_timeperiod
SQL: OPTIMIZE TABLE tbl_timeperiod
OPTIMIZING NAGIOSQL TABLE: tbl_timedefinition
SQL: OPTIMIZE TABLE tbl_timedefinition
OPTIMIZING NAGIOSQL TABLE: tbl_user
SQL: OPTIMIZE TABLE tbl_user
PHP Warning: unlink(/usr/local/nagiosxi/var/dbmaint.lock): No such file or directory in /usr/local/nagiosxi/cron/dbmaint.php on line 325
Repair Complete: Removing Lock File
If these errors are nothing to be concerned about, then I think we are done with this case. Database Maintenance now has a 'G'reen' status.
Again, thank you for all your help.
OPTIMIZING NAGIOSXI TABLE: xi_commands
SQL: VACUUM ANALYZE xi_commands;
SQL: SQL Error [nagiosxi] :</b> ERROR: tuple concurrently updatedOPTIMIZING NAGIOSXI TABLE: xi_events
SQL: VACUUM ANALYZE xi_events;
SQL: SQL Error [nagiosxi] :</b> ERROR: tuple concurrently updatedOPTIMIZING NAGIOSXI TABLE: xi_notifications
SQL: VACUUM ANALYZE xi_notifications;
SQL: SQL Error [nagiosxi] :</b> ERROR: relation "xi_notifications" does not existOPTIMIZING NAGIOSXI TABLE: xi_meta
SQL: VACUUM ANALYZE xi_meta;
SQL: SQL Error [nagiosxi] :</b> ERROR: tuple concurrently updatedOPTIMIZING NAGIOSXI TABLE: xi_options
SQL: VACUUM ANALYZE xi_options;
OPTIMIZING NAGIOSXI TABLE: xi_sysstat
SQL: VACUUM ANALYZE xi_sysstat;
SQL: SQL Error [nagiosxi] :</b> ERROR: tuple concurrently updatedOPTIMIZING NAGIOSXI TABLE: xi_usermeta
SQL: VACUUM ANALYZE xi_usermeta;
OPTIMIZING NAGIOSXI TABLE: xi_users
SQL: VACUUM ANALYZE xi_users;
CLEANING nagiosql TABLE 'logbook'...
SQL: DELETE FROM tbl_logbook WHERE time < FROM_UNIXTIME(1325590571)
OPTIMIZING NAGIOSQL TABLE: tbl_contact
SQL: OPTIMIZE TABLE tbl_contact
OPTIMIZING NAGIOSQL TABLE: tbl_host
SQL: OPTIMIZE TABLE tbl_host
OPTIMIZING NAGIOSQL TABLE: tbl_lnkHostToHost
SQL: OPTIMIZE TABLE tbl_lnkHostToHost
OPTIMIZING NAGIOSQL TABLE: tbl_lnkHostdependencyToHost_DH
SQL: OPTIMIZE TABLE tbl_lnkHostdependencyToHost_DH
OPTIMIZING NAGIOSQL TABLE: tbl_lnkHostdependencyToHost_H
SQL: OPTIMIZE TABLE tbl_lnkHostdependencyToHost_H
OPTIMIZING NAGIOSQL TABLE: tbl_lnkServiceToHost
SQL: OPTIMIZE TABLE tbl_lnkServiceToHost
OPTIMIZING NAGIOSQL TABLE: tbl_lnkServicedependencyToService_DS
SQL: OPTIMIZE TABLE tbl_lnkServicedependencyToService_DS
OPTIMIZING NAGIOSQL TABLE: tbl_lnkServicedependencyToService_S
SQL: OPTIMIZE TABLE tbl_lnkServicedependencyToService_S
OPTIMIZING NAGIOSQL TABLE: tbl_lnkServiceToHostgroup
SQL: OPTIMIZE TABLE tbl_lnkServiceToHostgroup
OPTIMIZING NAGIOSQL TABLE: tbl_logbook
SQL: OPTIMIZE TABLE tbl_logbook
OPTIMIZING NAGIOSQL TABLE: tbl_service
SQL: OPTIMIZE TABLE tbl_service
OPTIMIZING NAGIOSQL TABLE: tbl_timeperiod
SQL: OPTIMIZE TABLE tbl_timeperiod
OPTIMIZING NAGIOSQL TABLE: tbl_timedefinition
SQL: OPTIMIZE TABLE tbl_timedefinition
OPTIMIZING NAGIOSQL TABLE: tbl_user
SQL: OPTIMIZE TABLE tbl_user
PHP Warning: unlink(/usr/local/nagiosxi/var/dbmaint.lock): No such file or directory in /usr/local/nagiosxi/cron/dbmaint.php on line 325
Repair Complete: Removing Lock File
If these errors are nothing to be concerned about, then I think we are done with this case. Database Maintenance now has a 'G'reen' status.
Again, thank you for all your help.
Re: Production Instance of Nagios gone
Those errors may have been a result of two concurrent maintenance runs on that table. I'm guessing we're in good shape, but let us know if you run into any more issues.