Page 2 of 3

Re: NdoUtils stop working

Posted: Thu Jun 30, 2016 12:29 pm
by algomas123
Thanks again for your help.
You may want to enable debugging in the ndo2db.cfg file and see what error shows up there when the issue happens again.
It is already activated, with a "tail -F" I can see lots of queries running...looks ok. But when the issue happens, it just stop to write queries...it doesn't write errors...just nothing.
We might get more details on what is failing which the developers could use.
Of course, whatever I could help I will.
Nagios Core only uses the MYSQL database to store it's information / status for other 3rd party tools to use, it doesn't use it to run.
Yes, sorry. What I meant was that I had on database information obtained meanwhile this issue was happening. I had no message queue , ndo2db using 90% CPU (I guess because of the infinite for loop), /var/log/message printing error...but inserting data to database (not sure if inserting, but at least I had recent data)...make not sense!!

Re: NdoUtils stop working

Posted: Thu Jun 30, 2016 1:14 pm
by tgriep
Thanks for your help, the next time is happens, could you post the last 50 lines of the ndo2db error log and the the output of ipcs -q and anything else you find?
Another thing you can look at are the settings for the MYSQL server. Maybe increasing buffers, connections may help in this issue.

Re: NdoUtils stop working

Posted: Thu Jun 30, 2016 2:40 pm
by algomas123
Ok, I will do that.

Meanwhile, I observed in ndo2db log that I have LOTS (like 10 or 15 times more than another query) queries similar to:

Code: Select all

DELETE FROM nagios_timedeventqueue WHERE instance_id='1' AND event_type='8' AND scheduled_time=FROM_UNIXTIME(1467305288) AND recurring_event='1' AND object_id='0'
Sometime it change the event_type, another times it change object_id...but most of them are exactly that query...

I have query that table and it is always empty...

Is it normal?

Re: NdoUtils stop working

Posted: Thu Jun 30, 2016 3:01 pm
by tgriep
Yes, those logs are normal, it is just the server removing unneeded data.

Re: NdoUtils stop working

Posted: Fri Jul 01, 2016 5:55 am
by algomas123
Ok, now it is crashing:

ndo2db.debug:

Code: Select all

[1467361331.389468] [002.0] [pid=10649] DELETE FROM nagios_timedeventqueue WHERE instance_id='1' AND event_type='12' AND scheduled_time=FROM_UNIXTIME(1467361331) AND recurring_event='0' AND object_id='398'
[1467361335.403985] [002.0] [pid=10649] DELETE FROM nagios_timedeventqueue WHERE instance_id='1' AND event_type='0' AND scheduled_time=FROM_UNIXTIME(1467361335) AND recurring_event='0' AND object_id='428'
[1467361335.404330] [002.0] [pid=10649] DELETE FROM nagios_timedeventqueue WHERE instance_id='1' AND scheduled_time<FROM_UNIXTIME(1467361335)
[1467361335.404529] [002.0] [pid=10649] DELETE FROM nagios_timedeventqueue WHERE instance_id='1' AND event_type='0' AND scheduled_time=FROM_UNIXTIME(1467361335) AND recurring_event='0' AND object_id='428'
[1467361335.404758] [002.0] [pid=10649] INSERT INTO nagios_programstatus SET instance_id='1', status_update_time=FROM_UNIXTIME(1467361335), program_start_time=FROM_UNIXTIME(1467282948), is_currently_running='1', process_id='10640', daemon_mode='1', last_command_check=FROM_UNIXTIME(0), last_log_rotation=FROM_UNIXTIME(1467323999), notifications_enabled='1', active_service_checks_enabled='1', passive_service_checks_enabled='1', active_host_checks_enabled='1', passive_host_checks_enabled='1', event_handlers_enabled='1', flap_detection_enabled='1', failure_prediction_enabled='0', process_performance_data='1', obsess_over_hosts='0', obsess_over_services='0', modified_host_attributes='0', modified_service_attributes='0', global_host_event_handler='', global_service_event_handler='' ON DUPLICATE KEY UPDATE instance_id='1', status_update_time=FROM_UNIXTIME(1467361335), program_start_time=FROM_UNIXTIME(1467282948), is_currently_running='1', process_id='10640', daemon_mode='1', last_command_check=FROM_UNIXTIME(0), last_log_rotation=FROM_UNIXTIME(1467323999), notifications_enabled='1', active_service_checks_enabled='1', passive_service_checks_enabled='1', active_host_checks_enabled='1', passive_host_checks_enabled='1', event_handlers_enabled='1', flap_detection_enabled='1', failure_prediction_enabled='0', process_performance_data='1', obsess_over_hosts='0', obsess_over_services='0', modified_host_attributes='0', modified_service_attributes='0', global_host_event_handler='', global_service_event_handler=''
[1467361337.976287] [002.0] [pid=10649] DELETE FROM nagios_timedeventqueue WHERE instance_id='1' AND event_type='99' AND scheduled_time=FROM_UNIXTIME(1467361338) AND recurring_event='1' AND object_id='0'
[1467361337.976852] [002.0] [pid=10649] INSERT INTO nagios_systemcommands SET instance_id='1', start_time=FROM_UNIXTIME(1467361337), start_time_usec='976182', end_time=FROM_UNIXTIME(0), end_time_usec='0', command_line='/bin/mv /usr/local/pnp4nagios/var/service-perfdata /usr/local/pnp4nagios/var/spool/service-perfdata\.1467361337', timeout='5', early_timeout='0', execution_time='0.000000', return_code='0', output='', long_output='' ON DUPLICATE KEY UPDATE instance_id='1', start_time=FROM_UNIXTIME(1467361337), start_time_usec='976182', end_time=FROM_UNIXTIME(0), end_time_usec='0', command_line='/bin/mv /usr/local/pnp4nagios/var/service-perfdata /usr/local/pnp4nagios/var/spool/service-perfdata\.1467361337', timeout='5', early_timeout='0', execution_time='0.000000', return_code='0', output='', long_output=''
[1467361337.990155] [002.0] [pid=10649] INSERT INTO nagios_systemcommands SET instance_id='1', start_time=FROM_UNIXTIME(1467361337), start_time_usec='976182', end_time=FROM_UNIXTIME(1467361337), end_time_usec='989896', command_line='/bin/mv /usr/local/pnp4nagios/var/service-perfdata /usr/local/pnp4nagios/var/spool/service-perfdata\.1467361337', timeout='5', early_timeout='0', execution_time='0.013000', return_code='0', output='', long_output='' ON DUPLICATE KEY UPDATE instance_id='1', start_time=FROM_UNIXTIME(1467361337), start_time_usec='976182', end_time=FROM_UNIXTIME(1467361337), end_time_usec='989896', command_line='/bin/mv /usr/local/pnp4nagios/var/service-perfdata /usr/local/pnp4nagios/var/spool/service-perfdata\.1467361337', timeout='5', early_timeout='0', execution_time='0.013000', return_code='0', output='', long_output=''
[1467361337.990651] [002.0] [pid=10649] DELETE FROM nagios_timedeventqueue WHERE instance_id='1' AND event_type='99' AND scheduled_time=FROM_UNIXTIME(1467361338) AND recurring_event='1' AND object_id='0'
[1467361337.990871] [002.0] [pid=10649] DELETE FROM nagios_timedeventqueue WHERE instance_id='1' AND event_type='99' AND scheduled_time=FROM_UNIXTIME(1467361338) AND recurring_event='1' AND object_id='0'
[1467361337.991078] [002.0] [pid=10649] INSERT INTO nagios_systemcommands SET instance_id='1', start_time=FROM_UNIXTIME(1467361337), start_time_usec='990446', end_time=FROM_UNIXTIME(0), end_time_usec='0', command_line='/bin/mv /usr/local/pnp4nagios/var/host-perfdata /usr/local/pnp4nagios/var/spool/host-perfdata\.1467361337', timeout='5', early_timeout='0', execution_time='0.000000', return_code='0', output='', long_output='' ON DUPLICATE KEY UPDATE instance_id='1', start_time=FROM_UNIXTIME(1467361337), start_time_usec='990446', end_time=FROM_UNIXTIME(0), end_time_usec='0', command_line='/bin/mv /usr/local/pnp4nagios/var/host-perfdata /usr/local/pnp4nagios/var/spool/host-perfdata\.1467361337', timeout='5', early_timeout='0', execution_time='0.000000', return_code='0', output='', long_output=''
[1467361338.004438] [002.0] [pid=10649] INSERT INTO nagios_systemcommands SET instance_id='1', start_time=FROM_UNIXTIME(1467361337), start_time_usec='990446', end_time=FROM_UNIXTIME(1467361338), end_time_usec='3', command_line='/bin/mv /usr/local/pnp4nagios/var/host-perfdata /usr/local/pnp4nagios/var/spool/host-perfdata\.1467361337', timeout='5', early_timeout='0', execution_time='0.014000', return_code='0', output='', long_output='' ON DUPLICATE KEY UPDATE instance_id='1', start_time=FROM_UNIXTIME(1467361337), start_time_usec='990446', end_time=FROM_UNIXTIME(1467361338), end_time_usec='3', command_line='/bin/mv /usr/local/pnp4nagios/var/host-perfdata /usr/local/pnp4nagios/var/spool/host-perfdata\.1467361337', timeout='5', early_timeout='0', execution_time='0.014000', return_code='0', output='', long_output=''
[1467361338.008017] [002.0] [pid=10649] DELETE FROM nagios_timedeventqueue WHERE instance_id='1' AND event_type='99' AND scheduled_time=FROM_UNIXTIME(1467361338) AND recurring_event='1' AND object_id='0'
[1467361338.008422] [002.0] [pid=10649] DELETE FROM nagios_timedeventqueue WHERE instance_id='1' AND event_type='8' AND scheduled_time=FROM_UNIXTIME(1467361338) AND recurring_event='1' AND object_id='0'
[1467361338.010277] [002.0] [pid=10649] DELETE FROM nagios_timedeventqueue WHERE instance_id='1' AND event_type='8' AND scheduled_time=FROM_UNIXTIME(1467361338) AND recurring_event='1' AND object_id='0'
[1467361338.010531] [002.0] [pid=10649] DELETE FROM nagios_timedeventqueue WHERE instance_id='1' AND event_type='5' AND scheduled_time=FROM_UNIXTIME(1467361338) AND recurring_event='1' AND object_id='0'
[1467361338.010855] [002.0] [pid=10649] DELETE FROM nagios_timedeventqueue WHERE instance_id='1' AND event_type='5' AND scheduled_time=FROM_UNIXTIME(1467361338) AND recurring_event='1' AND object_id='0'
and it is not quering anymore...


top output:
Capture.JPG
mysqld do not appear there but it is running. I can query to the bbdd without problem, and it runs fast.

ipcs -q
Capture2.JPG
just nothing!!!

/var/log/messages
Capture3.JPG
and lots of queue recv error: Invalid argument...


hope this help!!

thank!!

EDIT: I used to fix the issue just restarting nagios... so I guess that problem is not with ndo2db...

Re: NdoUtils stop working

Posted: Mon Jul 04, 2016 12:43 am
by Box293
Are you still having a problem?

Re: NdoUtils stop working

Posted: Mon Jul 04, 2016 10:21 am
by algomas123
Yes, I do.

Re: NdoUtils stop working

Posted: Mon Jul 04, 2016 8:43 pm
by Box293
Can you please do the following:

Code: Select all

service nagios stop
service ndo2db stop
service mysqld restart
service ndo2db start
service nagios start
After Nagios has started, please run this command:

Code: Select all

ipcs -q
It should only show one nagios queue.

Does this resolve the problem?

Re: NdoUtils stop working

Posted: Tue Jul 05, 2016 12:30 pm
by algomas123
Hello!

yes, it solves the problem. At fact, just restarting nagios solves the problem too.

But after some hours it is crashing again...

Currently, I have a crontab that restart nagios every hour...but I think it is not an elegant solution...

Re: NdoUtils stop working

Posted: Tue Jul 05, 2016 1:35 pm
by tgriep
Can you post your nagios.cfg and the ndomod.cfg file so we can view them?