Page 1 of 2
Can no longer ACK alerts
Posted: Thu Nov 15, 2018 12:19 pm
by bomahony
Something seems to have happened one of the 4 XI installs I am working on at the moment. In one of them . I can no longer ACK any alerts since the 5th[ish - that was the last one].
It shows something about connecting to "go.nagios.com" ?
I rebooted the system and it is the same.
Re: Can no longer ACK alerts
Posted: Thu Nov 15, 2018 12:21 pm
by bomahony
I have another XI node that is almost identical in another DC, that works fine. I updated both to 5.5.7 today.
Re: Can no longer ACK alerts
Posted: Thu Nov 15, 2018 3:16 pm
by npolovenko
@bomahony, Please run through the following commands and let me know if that resolves the issue:
service nagios stop
service ndo2db stop
rm /usr/local/nagios/var/retention.dat
mv /usr/local/nagios/var/ndo2db.lock /usr/local/nagios/var/ndo2db.lock.bak
mv /usr/local/nagios/var/ndo.sock /usr/local/nagios/var/ndo.sock.bak
service ndo2db start
service nagios start
Re: Can no longer ACK alerts
Posted: Fri Nov 16, 2018 1:12 pm
by bomahony
Will do. On leave until Tuesday, and will do it then. FYI i did reboot the VM so don't know if that would have done most of that?
Re: Can no longer ACK alerts
Posted: Fri Nov 16, 2018 1:18 pm
by npolovenko
@bomahony, The reboot should do most of it. Except if you had a crashed ndo2db process that left ndo.sock and ndo2db.lock files behind. So If you reboot on Tuesday and the problem is still there go ahead and run these commands anyway.
Re: Can no longer ACK alerts
Posted: Tue Nov 20, 2018 8:42 am
by bomahony
Same issue.
The lock & sock files didn't exist as the service shut down cleanly.
The removal of retention.dat has cleared all the previous acks. It also didn't recreate the file when i started the services.
After a reboot it seems to have rebuilt the retention.dat @35M with all the previous data? So i restarted services and deleted retention again.
I tried different browsers, and both as nagiosadmin and my own [admin] user.
However Mass ACK seems to work? [Which then recreated the retention.dat file again]
Re: Can no longer ACK alerts
Posted: Tue Nov 20, 2018 11:20 am
by bomahony
Seems when I click on "Network Outages" I can get an Error "Unable to parse XML output" also.
I am going to compare permissions on two instances.
Re: Can no longer ACK alerts
Posted: Tue Nov 20, 2018 1:43 pm
by npolovenko
@bomahony, Could you send me a system profile from the problematic XI instance?
To send us your system profile. Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file and send it to me in a personal message.
After you send me the profile please post something in this thread to bring it back up in the support queue.
Also, please open the /etc/init.d/nagios script and make sure that the following lines point to the files in the correct locations:
prefix=/usr/local/nagios
NagiosStatusFile=${prefix}/var/status.dat
NagiosRetentionFile=${prefix}/var/retention.dat
NagiosCommandFile=${prefix}/var/rw/nagios.cmd
NagiosRunFile=${prefix}/var/nagios.lock
Re: Can no longer ACK alerts
Posted: Tue Nov 20, 2018 1:47 pm
by bomahony
Ok. I think this may have been an old issue. I moved XI to its own FS at the start of the month. Of course I was a dope and screwed the root directory ownership. I had root:root instead of apache:nagios.
So, previously when I had a few seconds wait while it was trying to do stuff when I ACK'd , now it is immediate.
But it still doesnt ACK for some reason?
Will send the stuff on tomorrow. Just bailing out the door now!
Re: Can no longer ACK alerts
Posted: Tue Nov 20, 2018 2:06 pm
by npolovenko
@bomahony, That's good to know. Yeah, this is likely a permissions problem then. You can run this script as root:
/usr/local/nagiosxi/scripts/reset_config_perms.sh
Also, when you come back please send the permissions for all files that I listed earlier and all the parent directories.