Page 3 of 3

Re: Database Backend Is Not Running

Posted: Wed Jul 15, 2015 3:09 pm
by mcwhorts
I'm starting to see this now

Re: Database Backend Is Not Running

Posted: Wed Jul 15, 2015 3:17 pm
by jolson
In your ndo2db configuration file:

Code: Select all

lock_file=/usr/local/nagiosxi/var/subsys/ndo2db.lock
Let's change the debug level to -1:

Code: Select all

debug_level=-1
And restart it:

Code: Select all

service ndo2db restart
Is any log generated at /usr/local/nagios/var/ndo2db.debug? If so, what are the contents?

Code: Select all

cat /usr/local/nagios/var/ndo2db.debug

Re: Database Backend Is Not Running

Posted: Wed Jul 15, 2015 3:34 pm
by mcwhorts
I made those changes to the ndo2db config and restarted. So far nothing has been logged at /usr/local/nagios/var/ndo2db.debug

Re: Database Backend Is Not Running

Posted: Wed Jul 15, 2015 4:14 pm
by mcwhorts
I'm seeing this in the logs. I don't know if it's useful.

[1436994795.390529] [002.0] [pid=9722] INSERT INTO nagios_hoststatus SET instance_id='1', host_object_id='5604', status_update_time=FROM_UNIXTIME(1436994795), output='PING OK - Packet loss = 0%, RTA = 0\.85 ms', long_output='', perfdata='rta=0\.853000ms;3000\.000000;5000\.000000;0\.000000 pl=0%;80;100;0', current_state='0', has_been_checked='1', should_be_scheduled='1', current_check_attempt='1', max_check_attempts='3', last_check=FROM_UNIXTIME(1436994790), next_check=FROM_UNIXTIME(1436994975), check_type='0', last_state_change=FROM_UNIXTIME(1435803964), last_hard_state_change=FROM_UNIXTIME(1434561932), last_hard_state='0', last_time_up=FROM_UNIXTIME(1436994795), last_time_down=FROM_UNIXTIME(1430592573), last_time_unreachable=FROM_UNIXTIME(1435803922), state_type='1', last_notification=FROM_UNIXTIME(0), next_notification=FROM_UNIXTIME(0), no_more_notifications='0', notifications_enabled='1', problem_has_been_acknowledged='0', acknowledgement_type='0', current_notification_number='0', passive_checks_enabled='1', active_checks_enabled='1', event_handler_enabled='1', flap_detection_enabled='1', is_flapping='0', percent_state_change='0.000000', latency='0.000000', execution_time='4.502390', scheduled_downtime_depth='0', failure_prediction_enabled='0', process_performance_data='1', obsess_over_host='1', modified_host_attributes='0', event_handler='', check_command='check_nrpe_jpnmon!check_alive_nsin!!!!!!!', normal_check_interval='3.000000', retry_check_interval='1.000000', check_timeperiod_object_id='2' ON DUPLICATE KEY UPDATE instance_id='1', host_object_id='5604', status_update_time=FROM_UNIXTIME(1436994795), output='PING OK - Packet loss = 0%, RTA = 0\.85 ms', long_output='', perfdata='rta=0\.853000ms;3000\.000000;5000\.000000;0\.000000 pl=0%;80;100;0', current_state='0', has_been_checked='1', should_be_scheduled='1', current_check_attempt='1', max_check_attempts='3', last_check=FROM_UNIXTIME(1436994790), next_check=FROM_UNIXTIME(1436994975), check_type='0', last_state_change=FROM_UNIXTIME(1435803964), last_hard_state_change=FROM_UNIXTIME(1434561932), last_hard_state='0', last_time_up=FROM_UNIXTIME(1436994795), last_time_down=FROM_UNIXTIME(1430592573), last_time_unreachable=FROM_UNIXTIME(1435803922), state_type='1', last_notification=FROM_UNIXTIME(0), next_notification=FROM_UNIXTIME(0), no_more_notifications='0', notifications_enabled='1', problem_has_been_acknowledged='0', acknowledgement_type='0', current_notification_number='0', passive_checks_enabled='1', active_checks_enabled='1', event_handler_enabled='1', flap_detection_enabled='1', is_flapping='0', percent_state_change='0.000000', latency='0.000000', execution_time='4.502390', scheduled_downtime_depth='0', failure_prediction_enabled='0', process_performance_data='1', obsess_over_host='1', modified_host_attributes='0', event_handler='', check_command='check_nrpe_jpnmon!check_alive_nsin!!!!!!!', normal_check_interval='3.000000', retry_check_interval='1.000000', check_timeperiod_object_id='2'
[1436994795.391147] [002.0] [pid=9722] INSERT INTO nagios_hoststatus SET instance_id='1', host_object_id='5604', status_update_time=FROM_UNIXTIME(1436994795), output='PING OK - Packet loss = 0%, RTA = 0\.85 ms', long_output='', perfdata='rta=0\.853000ms;3000\.000000;5000\.000000;0\.000000 pl=0%;80;100;0', current_state='0', has_been_checked='1', should_be_scheduled='1', current_check_attempt='1', max_check_attempts='3', last_check=FROM_UNIXTIME(1436994790), next_check=FROM_UNIXTIME(1436994975), check_type='0', last_state_change=FROM_UNIXTIME(1435803964), last_hard_state_change=FROM_UNIXTIME(1434561932), last_hard_state='0', last_time_up=FROM_UNIXTIME(1436994795), last_time_down=FROM_UNIXTIME(1430592573), last_time_unreachable=FROM_UNIXTIME(1435803922), state_type='1', last_notification=FROM_UNIXTIME(0), next_notification=FROM_UNIXTIME(0), no_more_notifications='0', notifications_enabled='1', problem_has_been_acknowledged='0', acknowledgement_type='0', current_notification_number='0', passive_checks_enabled='1', active_checks_enabled='1', event_handler_enabled='1', flap_detection_enabled='1', is_flapping='0', percent_state_change='0.000000', latency='0.000000', execution_time='4.502390', scheduled_downtime_depth='0', failure_prediction_enabled='0', process_performance_data='1', obsess_over_host='1', modified_host_attributes='0', event_handler='', check_command='check_nrpe_jpnmon!check_alive_nsin!!!!!!!', normal_check_interval='3.000000', retry_check_interval='1.000000', check_timeperiod_object_id='2' ON DUPLICATE KEY UPDATE instance_id='1', host_object_id='5604', status_update_time=FROM_UNIXTIME(1436994795), output='PING OK - Packet loss = 0%, RTA = 0\.85 ms', long_output='', perfdata='rta=0\.853000ms;3000\.000000;5000\.000000;0\.000000 pl=0%;80;100;0', current_state='0', has_been_checked='1', should_be_scheduled='1', current_check_attempt='1', max_check_attempts='3', last_check=FROM_UNIXTIME(1436994790), next_check=FROM_UNIXTIME(1436994975), check_type='0', last_state_change=FROM_UNIXTIME(1435803964), last_hard_state_change=FROM_UNIXTIME(1434561932), last_hard_state='0', last_time_up=FROM_UNIXTIME(1436994795), last_time_down=FROM_UNIXTIME(1430592573), last_time_unreachable=FROM_UNIXTIME(1435803922), state_type='1', last_notification=FROM_UNIXTIME(0), next_notification=FROM_UNIXTIME(0), no_more_notifications='0', notifications_enabled='1', problem_has_been_acknowledged='0', acknowledgement_type='0', current_notification_number='0', passive_checks_enabled='1', active_checks_enabled='1', event_handler_enabled='1', flap_detection_enabled='1', is_flapping='0', percent_state_change='0.000000', latency='0.000000', execution_time='4.502390', scheduled_downtime_depth='0', failure_prediction_enabled='0', process_performance_data='1', obsess_over_host='1', modified_host_attributes='0', event_handler='', check_command='check_nrpe_jpnmon!check_alive_nsin!!!!!!!', normal_check_interval='3.000000', retry_check_interval='1.000000', check_timeperiod_object_id='2'


Is there something specific that I need to look for?

Re: Database Backend Is Not Running

Posted: Wed Jul 15, 2015 4:33 pm
by tgriep
The ndo2db lock file is being created in the wrong folder and that is causing the Database Backend Status to be wrong.

Stop the ndo2db process

Code: Select all

service ndo2db stop
Edit /usr/local/nagios/etc/ndo2db.cfg
change

Code: Select all

lock_file=/usr/local/nagiosxi/var/subsys/ndo2db.lock
to

Code: Select all

lock_file=/usr/local/nagios/var/ndo2db.lock
Now delete the old files

Code: Select all

rm /usr/local/nagiosxi/var/subsys/ndo2db*
rm /usr/local/nagios/var/ndo2db.lock
Start ndo2db

Code: Select all

service ndo2db start
Run this too and post back the output

Code: Select all

tail -50 /var/log/cron

Re: Database Backend Is Not Running

Posted: Wed Jul 15, 2015 4:44 pm
by mcwhorts
Jul 15 12:47:01 niteowl CROND[366]: (root) CMD (ps -ef|grep -v grep |grep vmtoolsd > /dev/null || echo " To restart: vmtoolsd -b /var/run/vmtoolsd.pid" | mailer.sh -s "VMware tools down" -system root)
Jul 15 12:47:01 niteowl CROND[368]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php > /usr/local/nagiosxi/var/reportengine.log 2>&1)
Jul 15 12:47:01 niteowl CROND[369]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1)
Jul 15 12:47:01 niteowl CROND[370]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1)
Jul 15 12:47:01 niteowl CROND[371]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1)
Jul 15 12:47:01 niteowl CROND[372]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)
Jul 15 12:47:01 niteowl CROND[380]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1)
Jul 15 12:48:01 niteowl CROND[4148]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1)
Jul 15 12:48:01 niteowl CROND[4149]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php > /usr/local/nagiosxi/var/reportengine.log 2>&1)
Jul 15 12:48:01 niteowl CROND[4150]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1)
Jul 15 12:48:01 niteowl CROND[4151]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1)
Jul 15 12:48:01 niteowl CROND[4152]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1)
Jul 15 12:48:01 niteowl CROND[4153]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)
Jul 15 12:48:01 niteowl CROND[4155]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php > /usr/local/nagiosxi/var/cleaner.log 2>&1)
Jul 15 12:48:01 niteowl CROND[4154]: (root) CMD (/etc/webmin/sysstats/sysstats.pl)
Jul 15 12:48:01 niteowl CROND[4163]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1)
Jul 15 12:49:01 niteowl CROND[8070]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1)
Jul 15 12:49:01 niteowl CROND[8071]: (root) CMD (/etc/webmin/sysstats/sysstats.pl)
Jul 15 12:49:01 niteowl CROND[8072]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php > /usr/local/nagiosxi/var/cleaner.log 2>&1)
Jul 15 12:49:01 niteowl CROND[8073]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php > /usr/local/nagiosxi/var/reportengine.log 2>&1)
Jul 15 12:49:01 niteowl CROND[8075]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)
Jul 15 12:49:01 niteowl CROND[8078]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1)
Jul 15 12:49:01 niteowl CROND[8079]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1)
Jul 15 12:49:01 niteowl CROND[8080]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1)
Jul 15 12:49:01 niteowl CROND[8082]: (root) CMD (differ.sh -a -f /var/adm/alert 2>&1 | mailer.sh -s "alert" -system [email protected],root`[ -f /.text ] && ( echo -n ,; cat /.text; )`)
Jul 15 12:49:01 niteowl CROND[8081]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1)
Jul 15 12:50:01 niteowl CROND[11902]: (root) CMD (/etc/webmin/sysstats/sysstats.pl)
Jul 15 12:50:01 niteowl CROND[11903]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php > /usr/local/nagiosxi/var/cleaner.log 2>&1)
Jul 15 12:50:01 niteowl CROND[11904]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/dbmaint.php > /usr/local/nagiosxi/var/dbmaint.log 2>&1)
Jul 15 12:50:01 niteowl CROND[11905]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php > /usr/local/nagiosxi/var/reportengine.log 2>&1)
Jul 15 12:50:01 niteowl CROND[11906]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1)
Jul 15 12:50:01 niteowl CROND[11909]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/deadpool.php > /usr/local/nagiosxi/var/deadpool.log 2>&1)
Jul 15 12:50:01 niteowl CROND[11912]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1)
Jul 15 12:50:01 niteowl CROND[11914]: (root) CMD (LANG=C LC_ALL=C /usr/bin/mrtg /etc/mrtg/mrtg.cfg --lock-file /var/lock/mrtg/mrtg_l --confcache-file /var/lib/mrtg/mrtg.ok)
Jul 15 12:50:01 niteowl CROND[11911]: (root) CMD (/usr/lib64/sa/sa1 1 1)
Jul 15 12:50:01 niteowl CROND[11910]: (root) CMD (topper.sh)
Jul 15 12:50:01 niteowl CROND[11915]: (root) CMD (access-mon.sh -rebuild 30 | egrep -v 'itchy|10.33.34.95|dbaud|toptrack' >> /var/log/access.log 2>>/tmp/access-mon-err)
Jul 15 12:50:01 niteowl CROND[11921]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)
Jul 15 12:50:01 niteowl CROND[11919]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1)
Jul 15 12:50:01 niteowl CROND[11916]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1)
Jul 15 12:50:01 niteowl CROND[11926]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1)
Jul 15 12:51:01 niteowl CROND[13619]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cleaner.php > /usr/local/nagiosxi/var/cleaner.log 2>&1)
Jul 15 12:51:01 niteowl CROND[13620]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/sysstat.php > /usr/local/nagiosxi/var/sysstat.log 2>&1)
Jul 15 12:51:01 niteowl CROND[13621]: (root) CMD (/etc/webmin/sysstats/sysstats.pl)
Jul 15 12:51:01 niteowl CROND[13622]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/cmdsubsys.php > /usr/local/nagiosxi/var/cmdsubsys.log 2>&1)
Jul 15 12:51:01 niteowl CROND[13623]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/reportengine.php > /usr/local/nagiosxi/var/reportengine.log 2>&1)
Jul 15 12:51:01 niteowl CROND[13625]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/perfdataproc.php > /usr/local/nagiosxi/var/perfdataproc.log 2>&1)
Jul 15 12:51:01 niteowl CROND[13626]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/feedproc.php > /usr/local/nagiosxi/var/feedproc.log 2>&1)
Jul 15 12:51:01 niteowl CROND[13628]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/nom.php > /usr/local/nagiosxi/var/nom.log 2>&1)
Jul 15 12:51:01 niteowl CROND[13629]: (nagios) CMD (/usr/bin/php -q /usr/local/nagiosxi/cron/eventman.php > /usr/local/nagiosxi/var/eventman.log 2>&1)

Re: Database Backend Is Not Running

Posted: Wed Jul 15, 2015 8:19 pm
by mcwhorts
After running these commands again
service nagios stop
killall -9 nagios
service ndo2db stop
service mysqld stop
service crond stop
service mysqld start
service ndo2db start
service nagios start
service crond start

It seems as though ndo2bd finally came to life. Everything appears to running as it should.

Thanks for all your help guys!