Page 1 of 1

NCPA 2.1.6 on AIX is doing a dump on start

Posted: Fri Apr 26, 2019 12:01 pm
by Keystone
AIX 7.1 (7100-05-* to be exact)

After upgrading from 2.0.6-4 to 2.1.6 we are seeing this type of issue:
https://github.com/NagiosEnterprises/ncpa/issues/504

Even after running this:
chown -R nagios /usr/local/ncpa
chgrp -R nagios /usr/local/ncpa
...to correct the ((new default) root:system) ownership issue that occurred during the upgrade. We still have the issue where we can get the "ncpa_passive" to run, but not the "ncpa_listener"

This exists: /usr/local/ncpa/var/log/ncpa_passive.log
This file never gets created: /usr/local/ncpa/var/log/ncpa_listener.log

As the nagios user I've confirmed that there isn't a permissions issue for the nagios user writing to the "/usr/local/ncpa/var/log/ncpa_listener.log" file by:
date > /usr/local/ncpa/var/log/ncpa_listener.log
Note, I've already corrected the botch user/group permissions.

When when we start it by root:
$ /usr/local/ncpa/ncpa_listener --nodaemon --start
Memory fault(coredump)

When we analyze the core file
$ /usr/lib/ras/check_core /core |tail -1
ncpa_listener

So currently we are stuck with the limitations of 2.0, desperately need to avoid the bugs from 2.1.1, and have no upgrade path. Please help.

Re: NCPA 2.1.6 on AIX is doing a dump on start

Posted: Fri Apr 26, 2019 2:35 pm
by tgriep
On AIX, you cannot start the NCPA agent like you do on a Linux system.
Doing that could cause issues.

To start NCPA, run this

Code: Select all

startsrc -e LIBPATH=/usr/local/ncpa -s ncpa_listener
startsrc -e LIBPATH=/usr/local/ncpa -s ncpa_passive
To stop NCPA, run this

Code: Select all

stopsrc -s ncpa_listener -f
stopsrc -s ncpa_passive -f 
Try starting it using the above example and it is fails, post the log file so we can view it.

Code: Select all

/usr/local/ncpa/var/log/ncpa_listener.log

Re: NCPA 2.1.6 on AIX is doing a dump on start

Posted: Fri Apr 26, 2019 5:42 pm
by Keystone
Had the AIX engineer run:
stopsrc -s ncpa_listener -f
stopsrc -s ncpa_passive -f
startsrc -e LIBPATH=/usr/local/ncpa -s ncpa_listener
startsrc -e LIBPATH=/usr/local/ncpa -s ncpa_passive


The /usr/local/ncpa/var/log/ncpa_listener.log still has no data in it. Nothing is ever being written to it. The only reason any file exist is that as the nagios user I did a "touch /usr/local/ncpa/var/log/ncpa_listener.log".


After he ran the above commands, a "ps -ef | grep ncpa" only shows one line:
nagios 13369508 3866810 0 15:39:47 - 0:00 /usr/local/ncpa/ncpa_passive -n

This is what the engineer showed me:
$ startsrc -e LIBPATH=/usr/local/ncpa -s ncpa_listener
0513-059 The ncpa_listener Subsystem has been started. Subsystem PID is 15794308.
root@<boxNameHidden>:/
$ startsrc -e LIBPATH=/usr/local/ncpa -s ncpa_passive
0513-059 The ncpa_passive Subsystem has been started. Subsystem PID is 13369508.
root@<boxNameHidden>:/
$
root@<boxNameHidden>/
$ lssrc -a |grep -i ncpa
ncpa_passive 13369508 active
ncpa_listener inoperative

Re: NCPA 2.1.6 on AIX is doing a dump on start

Posted: Mon Apr 29, 2019 10:00 am
by tgriep
Lets see if we can get the listener to output any errors and not a core dump.

Change to the /usr/local/ncpa folder and run the agent without any command line options like this.

Code: Select all

./ncpa_listener
If it outputs any errors or messages, post them here.

Re: NCPA 2.1.6 on AIX is doing a dump on start

Posted: Mon May 20, 2019 12:56 pm
by Keystone
AIX admin stated:
This is all I see when I run the below command.

root@sandboxrk1p:/usr/local/ncpa
$ ./ncpa_listener
Memory fault(coredump)

Re: NCPA 2.1.6 on AIX is doing a dump on start

Posted: Tue May 21, 2019 11:33 am
by tgriep
I suggest removing the agent from the server and delete the whole directory where the agent was installed and then install the newest agent again.
I have a feeling the upgrade did not update all of the files and it is using the old files from the folder and causing the core dump.