Page 1 of 2

NCPA error on AIX 6.1

Posted: Fri May 22, 2020 10:48 am
by occ
Hi,
today we had an issue with one share mounted on one aix 6.1 cluster (2 nodes)
The share source went offline so we had to urgently migrate it to another host but mantaining the original mount point.

Since this moment ncpa listener and passive refuse to start with the following message :

Code: Select all

Traceback (most recent call last):
  File "/opt/freeware/lib/python2.7/site-packages/cx_Freeze-4.3.4-py2.7-aix-6.1.egg/cx_Freeze/initscripts/Console.py", line 27, in <module>
  File "ncpa_listener.py", line 5, in <module>
  File "/tmp/test/ncpa/agent/ncpadaemon.py", line 14, in <module>
  File "/tmp/test/ncpa/agent/listener/database.py", line 5, in <module>
  File "/tmp/test/ncpa/agent/listener/server.py", line 11, in <module>
  File "/tmp/test/ncpa/agent/listener/psapi.py", line 245, in <module>
  File "/tmp/test/ncpa/agent/listener/psapi.py", line 216, in get_root_node
  File "/tmp/test/ncpa/agent/listener/psapi.py", line 177, in get_disk_node
AttributeError: 'NoneType' object has no attribute 'split'
>startsrc -e LIBPATH=/usr/local/ncpa -s ncpa_listener

Code: Select all

0513-059 The ncpa_listener Subsystem has been started. Subsystem PID is 37028070.
>lssrc -a | grep ncpa

Code: Select all

ncpa_listener                                  inoperative
ncpa_passive                                   inoperative
>errpt -a |more

Code: Select all

LABEL:          SRC_SVKO
IDENTIFIER:     BC3BE5A3

Date/Time:       Fri May 22 17:21:31 CEST 2020
Sequence Number: 119774
Machine Id:      00FBC6DC4C00
Node Id:         #####
Class:           S
Type:            PERM
WPAR:            Global
Resource Name:   SRC

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        MANUALLY RESTART SUBSYSTEM IF NEEDED

Detail Data
SYMPTOM CODE
       65280
SOFTWARE ERROR CODE
       -9017
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'383'
FAILING MODULE
ncpa_listener
Unfortunately we can't reboot none of the server.
We can't find a way to restore it.
Uninstalling and reinstalling the agent did not solve the problem.

Is there something else we ca do ?

NCPA installed = ncpa-2.1.1.aix6.1

Thanks
Regards

Re: NCPA error on AIX 6.1

Posted: Fri May 22, 2020 12:30 pm
by ssax
What does it say if you try to run it manually in the foreground?

Code: Select all

LD_LIBRARY_PATH=/usr/local/ncpa /usr/local/ncpa/ncpa_listener -n

Re: NCPA error on AIX 6.1

Posted: Fri May 22, 2020 12:41 pm
by occ
Sorry, the result is the first code section i've posted.

Anyway here it is :

Code: Select all

 File "/opt/freeware/lib/python2.7/site-packages/cx_Freeze-4.3.4-py2.7-aix-6.1.egg/cx_Freeze/initscripts/Console.py", line 27, in <module>
  File "ncpa_listener.py", line 5, in <module>
  File "/tmp/test/ncpa/agent/ncpadaemon.py", line 14, in <module>
  File "/tmp/test/ncpa/agent/listener/database.py", line 5, in <module>
  File "/tmp/test/ncpa/agent/listener/server.py", line 11, in <module>
  File "/tmp/test/ncpa/agent/listener/psapi.py", line 245, in <module>
  File "/tmp/test/ncpa/agent/listener/psapi.py", line 216, in get_root_node
  File "/tmp/test/ncpa/agent/listener/psapi.py", line 177, in get_disk_node
AttributeError: 'NoneType' object has no attribute 'split'

Re: NCPA error on AIX 6.1

Posted: Fri May 22, 2020 2:42 pm
by ssax
Do you have a system you can compile the latest NCPA on? We don't currently have access to AIX 6 and the developer thinks this may have been fixed in later versions.

You can grab the source here:

https://github.com/NagiosEnterprises/ncpa

Other than that you'd probably need to use another agent such as NRPE:

https://assets.nagios.com/downloads/nag ... _Agent.pdf

You can also use check_by_ssh:

https://assets.nagios.com/downloads/nag ... ng_SSH.pdf

Re: NCPA error on AIX 6.1

Posted: Mon May 25, 2020 9:26 am
by occ
Hi,
actually we don't have an AIX 6.1 test server to use and compile the latest ncpa.

We'd like to avoid changing agent for a single cluster over many other.
On all other AIX 6.1 the ncpa is working fine.

Working trough SSH don't give us all the control we have now.

We'd like to find a viable solution to restore NCPA funcionality.

We'd like to point out that this is a production cluster so it's important for us to restore the agent.

Thanks

Re: NCPA error on AIX 6.1

Posted: Tue May 26, 2020 5:16 pm
by ssax
I've reached out to the developer again to see if he has any ideas, I will let you know what he says.

Re: NCPA error on AIX 6.1

Posted: Wed May 27, 2020 9:55 am
by ssax
The developer said to try changing your user to root in your ncpa.cfg and restart the ncpa_listener and ncpa_passive services:

Code: Select all

uid = root
gid = nagios
Then test again.

Re: NCPA error on AIX 6.1

Posted: Sat May 30, 2020 12:14 am
by occ
Nothing changed, same error.

Re: NCPA error on AIX 6.1

Posted: Mon Jun 01, 2020 5:26 pm
by benjaminsmith
Hi @occ,

Thanks for making the change and updating us with the results. I will follow up once more with development for additional feedback.

Re: NCPA error on AIX 6.1

Posted: Thu Jun 04, 2020 3:06 am
by occ
Good morning, the problem with the NCPA agent is causing us considerable problems with monitoring since the machine impacted by the problem is a very important host for the business of the company. Is it possible that there is no news of a possible resolution? Not even on the cause of the problem?