Page 2 of 2

Re: manual ndo2db restart required after "apply changes"

Posted: Wed May 06, 2015 11:04 am
by tgriep
I think the problem is that the Nagios XI is intermittently losing connection to the offloaded mysql server.
Are the both hosted at AWS?

You could use the following command to test the time is takes to show the tables.
Replace the username, password and remote host ip with what you are using.

Code: Select all

time echo 'show tables;' | mysql -t -u <username> -p<password> nagios -h <remote host ip>
If you feel comfortable doing this, you can edit both the ndomod.cfg and ndo2db.cfg files to use tcpsocket instead of unixsocket.
You should stop the nagios service and then the ndo2db service, edit the files and start the ndo2db service and then start the nagios service in this order.

Re: manual ndo2db restart required after "apply changes"

Posted: Thu May 07, 2015 9:55 am
by kendallchenoweth
The time test doesn't appear to behave any differently when I apply configuration or do not. I turned on debug and continue to see queries even after the problem has occurred.

Perhaps these error messages will provide more help to resolve the problem. I can reproduce this problem on other systems in the same Amazon/remote database environment. Do you still think that the tcp/unix sockets is the solution? I will try changing this setting and see if it makes a difference.

When I click on apply configuration, the following shows up in nagios.log

[1431010180] Caught SIGTERM, shutting down...
[1431010180] Successfully shutdown... (PID=1582)
[1431010180] Event broker module 'NERD' deinitialized successfully.
[1431010180] ndomod: Shutdown complete.
[1431010180] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
[1431010181] Nagios 4.0.8 starting... (PID=6876)
[1431010181] Local time is Thu May 07 10:49:41 EDT 2015
[1431010181] LOG VERSION: 2.0
[1431010181] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1431010181] qh: core query handler registered
[1431010181] nerd: Channel hostchecks registered successfully
[1431010181] nerd: Channel servicechecks registered successfully
[1431010181] nerd: Channel opathchecks registered successfully
[1431010181] nerd: Fully initialized and ready to rock!
[1431010181] wproc: Successfully registered manager as @wproc with query handler
[1431010181] wproc: Registry request: name=Core Worker 6879;pid=6879
[1431010181] wproc: Registry request: name=Core Worker 6881;pid=6881
[1431010181] wproc: Registry request: name=Core Worker 6883;pid=6883
[1431010181] wproc: Registry request: name=Core Worker 6878;pid=6878
[1431010181] wproc: Registry request: name=Core Worker 6880;pid=6880
[1431010181] wproc: Registry request: name=Core Worker 6882;pid=6882
[1431010181] ndomod: NDOMOD 2.0.0 (02-28-2014) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1431010181] ndomod: Successfully connected to data sink. 0 queued items to flush.
[1431010181] ndomod registered for process data
[1431010181] ndomod registered for log data'
[1431010181] ndomod registered for system command data'
[1431010181] ndomod registered for event handler data'
[1431010181] ndomod registered for notification data'
[1431010181] ndomod registered for comment data'
[1431010181] ndomod registered for downtime data'
[1431010181] ndomod registered for flapping data'
[1431010181] ndomod registered for program status data'
[1431010181] ndomod registered for host status data'
[1431010181] ndomod registered for service status data'
[1431010181] ndomod registered for adaptive program data'
[1431010181] ndomod registered for adaptive host data'
[1431010181] ndomod registered for adaptive service data'
[1431010181] ndomod registered for external command data'
[1431010181] ndomod registered for aggregated status data'
[1431010181] ndomod registered for retention data'
[1431010181] ndomod registered for contact data'
[1431010181] ndomod registered for contact notification data'
[1431010181] ndomod registered for acknowledgement data'
[1431010181] ndomod registered for state change data'
[1431010181] ndomod registered for contact status data'
[1431010181] ndomod registered for adaptive contact data'
[1431010181] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1431010182] Successfully launched command file worker with pid 6953

The Monitoring Engine Process on https://host/nagiosxi/admin/?xiwindow=s ... ringengine shows only a red dot status for "process state" and the start option will not successfully start the engine until I run "service restart ndo2db" after which everything starts up with the following messages in nagios.log

service ndo2db restart
Stopping ndo2db: head: cannot open `/usr/local/nagios/var/ndo2db.lock' for reading: No such file or directory
done.
Starting ndo2db: done.


[1431010071] ndomod: Error writing to data sink! Some output may get lost...
[1431010071] ndomod: Please check remote ndo2db log, database connection or SSL Parameters
[1431010087] ndomod: Successfully reconnected to data sink! 0 items lost, 69 queued items to flush.
[1431010087] ndomod: Successfully flushed 69 queued items to data sink.

Re: manual ndo2db restart required after "apply changes"

Posted: Thu May 07, 2015 10:21 am
by kendallchenoweth
I have a workaround. I add /etc/init.d/ndo2db restart to /usr/local/nagiosxi/scripts/restart_nagios_with_export.sh. This works if I click on apply from the web page as well. Does this information and the log files in the previous post suggest a root cause?

...
# Restart Nagios
/etc/init.d/nagios restart
ret=$?
if [ $ret -gt 0 ]; then
# Remove LOCKFILE
rm -f "$LOCKFILE"
exit 6
fi
/etc/init.d/ndo2db restart
# Make a new NOM checkpoint
./nom_create_nagioscore_checkpoint.sh > /dev/null 2>&1 &

# Remove LOCKFILE
rm -f "$LOCKFILE"
exit 0

Re: manual ndo2db restart required after "apply changes"

Posted: Thu May 07, 2015 11:00 am
by tgriep
The time test was for us to see how long it took to query the database, it will not fix anything. Can you run it and post the output?

Lets get some ndo2db debug logs when the problem is happening
Remove the /etc/init.d/ndo2db restart from the /usr/local/nagiosxi/scripts/restart_nagios_with_export.sh file that you added.

Edit this file
/usr/local/nagios/etc/ndo2db.cfg

Change this line from
debug_level=0
to
debug_level=-1

And change this line from
debug_verbosity=0
to
debug_verbosity=1

Save the file and restart ndo2bd by running the following
service ndo2db restart

Then run a apply configuration, when it is done, email in the following file.
/usr/local/nagios/var/ndo2db.debug

Re: manual ndo2db restart required after "apply changes"

Posted: Thu May 07, 2015 11:38 am
by kendallchenoweth
With a verbosity level of 1 or 2, I'm not seeing any entries in ndo2db.debug. When I change verbosity to -1, I see lots of data. All of the lines are "insert into" lines. Are am I getting less data because the database is remote?

Re: manual ndo2db restart required after "apply changes"

Posted: Thu May 07, 2015 11:56 am
by tgriep
That is strange, those settings work on my test system, it shouldn't matter if the mysql server is local or remote.

Capture the data when you do a Apply Configuration.
When that is done, turn off the logging for ndo2db and restart ndo2db and post the debug file.
Thanks