NCPA error on AIX 6.1

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
occ
Posts: 43
Joined: Fri Jan 11, 2019 5:05 am

NCPA error on AIX 6.1

Post by occ »

Hi,
today we had an issue with one share mounted on one aix 6.1 cluster (2 nodes)
The share source went offline so we had to urgently migrate it to another host but mantaining the original mount point.

Since this moment ncpa listener and passive refuse to start with the following message :

Code: Select all

Traceback (most recent call last):
  File "/opt/freeware/lib/python2.7/site-packages/cx_Freeze-4.3.4-py2.7-aix-6.1.egg/cx_Freeze/initscripts/Console.py", line 27, in <module>
  File "ncpa_listener.py", line 5, in <module>
  File "/tmp/test/ncpa/agent/ncpadaemon.py", line 14, in <module>
  File "/tmp/test/ncpa/agent/listener/database.py", line 5, in <module>
  File "/tmp/test/ncpa/agent/listener/server.py", line 11, in <module>
  File "/tmp/test/ncpa/agent/listener/psapi.py", line 245, in <module>
  File "/tmp/test/ncpa/agent/listener/psapi.py", line 216, in get_root_node
  File "/tmp/test/ncpa/agent/listener/psapi.py", line 177, in get_disk_node
AttributeError: 'NoneType' object has no attribute 'split'
>startsrc -e LIBPATH=/usr/local/ncpa -s ncpa_listener

Code: Select all

0513-059 The ncpa_listener Subsystem has been started. Subsystem PID is 37028070.
>lssrc -a | grep ncpa

Code: Select all

ncpa_listener                                  inoperative
ncpa_passive                                   inoperative
>errpt -a |more

Code: Select all

LABEL:          SRC_SVKO
IDENTIFIER:     BC3BE5A3

Date/Time:       Fri May 22 17:21:31 CEST 2020
Sequence Number: 119774
Machine Id:      00FBC6DC4C00
Node Id:         #####
Class:           S
Type:            PERM
WPAR:            Global
Resource Name:   SRC

Description
SOFTWARE PROGRAM ERROR

Probable Causes
APPLICATION PROGRAM

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        MANUALLY RESTART SUBSYSTEM IF NEEDED

Detail Data
SYMPTOM CODE
       65280
SOFTWARE ERROR CODE
       -9017
ERROR CODE
           0
DETECTING MODULE
'srchevn.c'@line:'383'
FAILING MODULE
ncpa_listener
Unfortunately we can't reboot none of the server.
We can't find a way to restore it.
Uninstalling and reinstalling the agent did not solve the problem.

Is there something else we ca do ?

NCPA installed = ncpa-2.1.1.aix6.1

Thanks
Regards
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: NCPA error on AIX 6.1

Post by ssax »

What does it say if you try to run it manually in the foreground?

Code: Select all

LD_LIBRARY_PATH=/usr/local/ncpa /usr/local/ncpa/ncpa_listener -n
occ
Posts: 43
Joined: Fri Jan 11, 2019 5:05 am

Re: NCPA error on AIX 6.1

Post by occ »

Sorry, the result is the first code section i've posted.

Anyway here it is :

Code: Select all

 File "/opt/freeware/lib/python2.7/site-packages/cx_Freeze-4.3.4-py2.7-aix-6.1.egg/cx_Freeze/initscripts/Console.py", line 27, in <module>
  File "ncpa_listener.py", line 5, in <module>
  File "/tmp/test/ncpa/agent/ncpadaemon.py", line 14, in <module>
  File "/tmp/test/ncpa/agent/listener/database.py", line 5, in <module>
  File "/tmp/test/ncpa/agent/listener/server.py", line 11, in <module>
  File "/tmp/test/ncpa/agent/listener/psapi.py", line 245, in <module>
  File "/tmp/test/ncpa/agent/listener/psapi.py", line 216, in get_root_node
  File "/tmp/test/ncpa/agent/listener/psapi.py", line 177, in get_disk_node
AttributeError: 'NoneType' object has no attribute 'split'
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: NCPA error on AIX 6.1

Post by ssax »

Do you have a system you can compile the latest NCPA on? We don't currently have access to AIX 6 and the developer thinks this may have been fixed in later versions.

You can grab the source here:

https://github.com/NagiosEnterprises/ncpa

Other than that you'd probably need to use another agent such as NRPE:

https://assets.nagios.com/downloads/nag ... _Agent.pdf

You can also use check_by_ssh:

https://assets.nagios.com/downloads/nag ... ng_SSH.pdf
occ
Posts: 43
Joined: Fri Jan 11, 2019 5:05 am

Re: NCPA error on AIX 6.1

Post by occ »

Hi,
actually we don't have an AIX 6.1 test server to use and compile the latest ncpa.

We'd like to avoid changing agent for a single cluster over many other.
On all other AIX 6.1 the ncpa is working fine.

Working trough SSH don't give us all the control we have now.

We'd like to find a viable solution to restore NCPA funcionality.

We'd like to point out that this is a production cluster so it's important for us to restore the agent.

Thanks
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: NCPA error on AIX 6.1

Post by ssax »

I've reached out to the developer again to see if he has any ideas, I will let you know what he says.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: NCPA error on AIX 6.1

Post by ssax »

The developer said to try changing your user to root in your ncpa.cfg and restart the ncpa_listener and ncpa_passive services:

Code: Select all

uid = root
gid = nagios
Then test again.
occ
Posts: 43
Joined: Fri Jan 11, 2019 5:05 am

Re: NCPA error on AIX 6.1

Post by occ »

Nothing changed, same error.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: NCPA error on AIX 6.1

Post by benjaminsmith »

Hi @occ,

Thanks for making the change and updating us with the results. I will follow up once more with development for additional feedback.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
occ
Posts: 43
Joined: Fri Jan 11, 2019 5:05 am

Re: NCPA error on AIX 6.1

Post by occ »

Good morning, the problem with the NCPA agent is causing us considerable problems with monitoring since the machine impacted by the problem is a very important host for the business of the company. Is it possible that there is no news of a possible resolution? Not even on the cause of the problem?
Locked