Page 3 of 4
Re: Nagios XI Services not working
Posted: Thu Jan 26, 2017 3:38 pm
by tgriep
Thanks, I received the PM and will share the files.
The output field in that table should be increased to store the long output from the other command we had you disable.
Login to the server as root, run the following command. This will increase that field.
Code: Select all
echo "use nagios;alter table nagios_servicestatus modify output varchar(65535) not null" | mysql -pnagiosxi
I'll see if we can find anything on the log files that can help out on the Segfault.
Re: Nagios XI Services not working
Posted: Thu Jan 26, 2017 3:45 pm
by tgriep
None of the log files had the Segfault in it so can you PM me the last 3 nagios log files from the following folder?
Also, can you PM in the archive messages file from the /var/log folder as well?
Thanks
Re: Nagios XI Services not working
Posted: Thu Jan 26, 2017 5:07 pm
by askewdread
tgriep wrote:None of the log files had the Segfault in it so can you PM me the last 3 nagios log files from the following folder?
Also, can you PM in the archive messages file from the /var/log folder as well?
Thanks
i have updated that mysql field
have pm'd
the messages files were too big to send so had to cut it down a bit to the bits you required
*reminds self must setup logrotate on messages*
Re: Nagios XI Services not working
Posted: Thu Jan 26, 2017 5:37 pm
by tgriep
It looks like the most of the Segfaults happen at startup so the next time it Segfaults, leave the retention.dat file alone and run the following for 30 seconds to a minute.
Code: Select all
cd /usr/local/nagios/bin
strace -f -s 256 ./nagios ../etc/nagios.cfg
Then if it crashes, post or PM as much of the screen as possible.
Re: Nagios XI Services not working
Posted: Thu Jan 26, 2017 5:51 pm
by askewdread
tgriep wrote:It looks like the most of the Segfaults happen at startup so the next time it Segfaults, leave the retention.dat file alone and run the following for 30 seconds to a minute.
Code: Select all
cd /usr/local/nagios/bin
strace -f -s 256 ./nagios ../etc/nagios.cfg
Then if it crashes, post or PM as much of the screen as possible.
will give that a go, thanks
interestingly i just tried stopping nagios, putting one of the retention.dat files back that broke it and it started perfectly fine... which is a bit odd
Re: Nagios XI Services not working
Posted: Thu Jan 26, 2017 6:01 pm
by dwhitfield
Our day is about to end. Of course we hope it doesn't crash, but if it does, we'll look for that output tomorrow morning!
Re: Nagios XI Services not working
Posted: Fri Jan 27, 2017 4:32 pm
by tmcdonald
In addition to the
strace output, we're trying to replicate this issue internally but have so far been unable to do so. In order to troubleshoot the issue further, we are asking all affected users to send us copies of the following files:
Code: Select all
/usr/local/nagios/var/retention.dat
/usr/local/nagios/var/objects.cache
/etc/redhat-release
Please PM if you can, otherwise let me know and we'll find a way to get copies of those files.
In addition, if you are able and willing to take a backup of a broken XI 5.4.0 system and send it for us to test/analyze, that would help as well but is not required. The following commands can be run as root to perform an XI backup:
Code: Select all
cd /usr/local/nagiosxi/scripts
./backup_xi.sh
This will create a backup tarball file located under
/store/backups/nagiosxi/ and the script will give the exact file name upon completion. You can restore from a previous backup once this is done to resume normal functionality by following the restore procedure outlined in this doc:
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
Re: Nagios XI Services not working
Posted: Fri Jan 27, 2017 5:38 pm
by askewdread
tmcdonald wrote:In addition to the
strace output, we're trying to replicate this issue internally but have so far been unable to do so. In order to troubleshoot the issue further, we are asking all affected users to send us copies of the following files:
Code: Select all
/usr/local/nagios/var/retention.dat
/usr/local/nagios/var/objects.cache
/etc/redhat-release
Please PM if you can, otherwise let me know and we'll find a way to get copies of those files.
In addition, if you are able and willing to take a backup of a broken XI 5.4.0 system and send it for us to test/analyze, that would help as well but is not required. The following commands can be run as root to perform an XI backup:
Code: Select all
cd /usr/local/nagiosxi/scripts
./backup_xi.sh
This will create a backup tarball file located under
/store/backups/nagiosxi/ and the script will give the exact file name upon completion. You can restore from a previous backup once this is done to resume normal functionality by following the restore procedure outlined in this doc:
https://assets.nagios.com/downloads/nag ... ios-XI.pdf
doesnt seem to want to do it again now.... kind of weird..... will do these when it does happen again

Re: Nagios XI Services not working
Posted: Sat Jan 28, 2017 4:53 pm
by askewdread
Hey,
have pm'd you a download link to download the files, this happened on my personal NagiosXI I'm currently building up this time, not our company one, but seems to do the same thing....
looking at the strace I can see this error?
"[pid 16478] writev(14, [{"*** Error in `", 14}, {"./nagios", 8}, {"': ", 3}, {"free(): invalid next size (normal)", 34}, {": 0x", 4}, {"00000000017fbb20", 16}, {" ***\n", 5}], 7*** Error in `./nagios': free(): invalid next size (normal): 0x00000000017fbb20 ***"
hopefully these help
Re: Nagios XI Services not working
Posted: Mon Jan 30, 2017 3:07 pm
by tmcdonald
Thank you, I've sent that off to our dev team for review.