Page 1 of 1

NCPA listener does not start--5693 in use

Posted: Mon Aug 29, 2016 1:24 pm
by corkyman
When I try to start the listener on one of the Linux machines I get
[c601018@SHLGNWSAD004 plugins]$ sudo /etc/init.d/ncpa_listener start
Started listener...
[c601018@SHLGNWSAD004 plugins]$

But it does not start--the ncpa_listener.log contains errors:

Code: Select all

2016-08-29 17:56:57,418 28484 INFO started
2016-08-29 17:56:57,418 28484 INFO Using SSL version TLSv1
2016-08-29 17:56:57,421 28484 ERROR [Errno 98] Address already in use: ('0.0.0.0', 5693)
Traceback (most recent call last):
  File "ncpa_posix_listener.py", line 62, in run
  File "/usr/local/lib/python2.7/site-packages/gevent/baseserver.py", line 282, in serve_forever
  File "/usr/local/lib/python2.7/site-packages/gevent/baseserver.py", line 234, in start
  File "/usr/local/lib/python2.7/site-packages/gevent/pywsgi.py", line 639, in init_socket
  File "/usr/local/lib/python2.7/site-packages/gevent/server.py", line 78, in init_socket
  File "/usr/local/lib/python2.7/site-packages/gevent/server.py", line 89, in get_listener
  File "/usr/local/lib/python2.7/site-packages/gevent/server.py", line 153, in _tcp_listener
  File "<string>", line 1, in bind
error: [Errno 98] Address already in use: ('0.0.0.0', 5693)
2016-08-29 17:56:57,422 28484 INFO stopped
I did a netstat -p | grep 5693 and the response contains a lot of CLOSE_WAIT.

Code: Select all

[c601018@SHLGNWSAD004 var]$ cat /tmp/netstat.log
tcp        0      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:53810 SYN_RECV    -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:38342 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:46244 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:40944 CLOSE_WAIT  -
tcp       41      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:37637 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:43611 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:47530 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:44334 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:43327 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:42426 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:45350 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:37285 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:40888 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:37582 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:37207 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:48165 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:41300 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:44907 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:48543 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:42550 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:38439 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:48476 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:47600 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:43767 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:47938 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:47326 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:46825 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:44234 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:46043 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:43388 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:47864 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:47258 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:44545 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:47064 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:43513 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:42133 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:winrm CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:46097 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:46617 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:40995 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:45868 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:47395 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:45282 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:45727 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:45215 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:47712 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:44080 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:38286 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:37981 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:44986 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:44671 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:46955 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:45969 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:38350 CLOSE_WAIT  -
tcp       99      0 shlgnwsad004.tvlport.n:5693 VHLGNTVMN015.tvlport.:41798 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:48620 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:42035 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:47457 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:45913 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:38295 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:41851 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:38450 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:41549 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:48392 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:44178 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:44602 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:40753 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:45045 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:46442 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:41943 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:42850 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:48088 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:43131 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:42342 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:45388 CLOSE_WAIT  -
tcp        1      0 shlgnwsad004.tvlport.n:5693 VHLGNTVMN016.tvlport.:46506 CLOSE_WAIT  27680/cybAgent.bin
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:42274 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:46381 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:41678 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:46690 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:38204 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:38182 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:48228 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:41418 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:46480 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:41073 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:46759 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:45450 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:47788 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:44740 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:44391 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:42613 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:43459 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:48010 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:46888 CLOSE_WAIT  -
tcp       99      0 shlgnwsad004.tvlport.n:5693 VHLGNTVMN016.tvlport.:34835 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:47199 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:44780 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:43996 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:46173 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:45523 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:43882 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:37311 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:43233 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:37065 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:47118 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:44272 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:47641 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:42512 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:42210 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:44839 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:46299 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:44464 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:48301 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:43023 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:41136 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:41224 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:45654 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:37017 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:45113 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:42745 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:45592 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:46550 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:45799 CLOSE_WAIT  -
tcp      250      0 shlgnwsad004.tvlport.n:5693 vhlgnngxi001.tvlport.:37409 CLOSE_WAIT  -
The cybAgent.bin process above is a process that was started by the ncpa plugin and it was triggered by the automation server VHLGNTVMN016. The listener worked back then a couple of weeks ago when I tested it. Not sure what brought the listener down. I uninstalled and reinstalled the ncpa agent but that did not help.

So the questions I have
1. Why is 5693 not available?
2. Why starting the listener does not give me the error and I have to look for an error in the log?
3. Why application (ncpa) does not close the sockets and leaves them in CLOSE_WAIT?

Re: NCPA listener does not start--5693 in use

Posted: Mon Aug 29, 2016 1:42 pm
by jomann
Can you verify there are no ncpa processes running already with something like:

Code: Select all

ps ax | grep ncpa_
Answering the questions below:

1. NCPA may already be bound to the port and somehow the PID is not saved for the service (not sure why, although reinstalling will overwrite the current NCPA files and old versions of the RPM did not stop the process before overwriting - which has been fixed in 2.0.0 once it comes out)
2. Does not give an error because it doesn't wait/check in the startup init script due to the web server actually starting once the script demonizes the process (this could be something we could work on giving better output for in the future - it should definitely suggest an NCPA process is currently running if it can find one)
3. I'm not sure on the CLOSE_WAIT portion. We use WSGI server in Python and don't really control how it accepts and manages connections to it, it just uses the defaults.

Re: NCPA listener does not start--5693 in use

Posted: Wed Aug 31, 2016 8:11 am
by corkyman
[c601018@SHLGNWSAD004 ~]$ ps ax | grep ncpa_
1227 pts/1 S+ 0:00 grep ncpa_
7851 ? SN 0:03 /usr/local/ncpa/ncpa_posix_passive --start
[c601018@SHLGNWSAD004 ~]$

Also, my Linux guy who has root privileges was able to run netstat with -t and the result is surprising:
[root@SHLGNWSAD004 templates]# netstat -tulpn |grep 5693
tcp 0 0 0.0.0.0:5693 0.0.0.0:* LISTEN 27680/cybAgent.bin

cybAgent is an ESP agent that uses ports 4000/4001. But this is the agent that I restarted using the NCPA agent by running a plugin. I have no idea why it is showing as listening on 5693. Here is what happened is why: I am working on ESP agent recovery automation and brought the ESP agent (cybAgent.bin) down. My Nagios process monitoring detected it and generated the alert into Omnibus. My Impact automation picked up automation and ran a recovery script that in turn executed an http query to the NCPA agent to run a plugin script. The script ran a script given to me by Linux guys to start the cybAgent. Everything worked fine except that now it is hogging the 5693 port it seems. I did not see the listener going down at a time since I did not have a reason to suspect that it would and I don't know when it did.
I feels like the web server is not closing the connection but I am not qualified to state that for sure.

Re: NCPA listener does not start--5693 in use

Posted: Wed Aug 31, 2016 10:19 am
by corkyman
I killed the cyAgent.bin process and the NCPA listener started fine. My current theory is that the ESP recovery script does not complete while starting the ESP agent and thus the plugin probably never completed. I can't verify that b/c I do not have the authority to run the script directly and put the question to my Linux guy. I should be able to circumvent this issue by running the recovery script like this /root/ASM/ESPD_Restart_ASM.sh &. Nevertheless, if this is what's going on I don't think the port should have been taken over by the plugin and should not cause the listener to go down.

Re: NCPA listener does not start--5693 in use

Posted: Wed Aug 31, 2016 11:34 am
by lmiltchev
I am glad you sorted this out. Do you mind if we close this thread?

Re: NCPA listener does not start--5693 in use

Posted: Wed Aug 31, 2016 12:53 pm
by corkyman
Please do not close. I don't think it is working like it should and will update later after testing and talking to my Linux guy.

Re: NCPA listener does not start--5693 in use

Posted: Wed Aug 31, 2016 1:59 pm
by ssax
Sounds good, we'll keep an eye out.

Re: NCPA listener does not start--5693 in use

Posted: Fri Sep 02, 2016 3:01 pm
by corkyman
Please close this incident. I talked to my Linux guy and was able to confirm that the recovery script executed by the NCPA plugin did not complete in the timely manner. When the script was changed everything worked fine--the port was released and ncpa listener could start. I am still not sure whether it's a normal operation when the script does not complete--it seems that even in that case the listener should not be affected and should be able to be restarted and the port should be available to it.