nsca 2.7.2 can't communicate with nsca-client 2.9

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
stefanlasiewski
Posts: 9
Joined: Fri Apr 20, 2012 3:24 pm

nsca 2.7.2 can't communicate with nsca-client 2.9

Post by stefanlasiewski »

I use Nagios 3.x. Many of my servers transmit passive check information using nsca-client. Most of these servers run nsca-client 2.7.2, and work fine. However, I have one FreeBSD 9 host which runs nsca-client 2.9.1, and something is broken.

Does nsca-client 2.9 work with an older version of the NSCA server daemon? Any ideas how I can fix this?

My setup:

- Nagios server: CentOS5.8, Nagios 3.23 and NSCA 2.7.2
- Nagios Clients: 50+ servers (FreeBSD, RHEL, CentOS 5 & 6) running nsca-client 2.7.2. These all run fine.
- On the Nagios server, in nsca.cfg I have 'decryption_method=3' and 'debug=1'.

I have one new FreeBSD 9.0-RELEASE server running with nsca-client 2.9.1. While this client can connect to the NSCA daemon, the passive check does not appear in the Nagios log and Nagios never registers this check. The Nagios log (/var/log/nagios/nagios.log) does not show any errors, although I enabled the debug option in nsca.cfg .

I ran the following test check on the remote host. This shows that send_nsca can connect to the Nagios server. I think it also shows that the encryption is working:

Code: Select all

# /bin/echo -e "hosta NSCA_CheckDisk       1       Test" | /usr/local/sbin/send_nsca -c /usr/local/etc/nagios/send_nsca.cfg -H nagios.example.org
1 data packet(s) sent to host successfully.
However on the Nagios server, there is no corresponding entry for this connection in the Nagios logfile.

Code: Select all

# grep NSCA_CheckDisk /var/log/nagios/nagios.log  |grep hosta
#
The Nagios server configuraiton does show a corresponding check for this service. This service showed an 'OK" status yesterday, but now shows an error because it has not received a update from nsca-client in many hours.

Running `tcpdump port 5667` on the Nagios Server shows a connection from the remote client:

Code: Select all

# tcpdump port 5667 and src host.example.org
listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
14:27:44.283095 IP host.example.org.49508 > nagios.example.org.nsca: S 212345652:26412302772(0) win 65535 <mss 1380,nop,wscale 6,sackOK,timestamp 112345651 0>
14:27:44.283531 IP host.example.org.49508 > nagios.example.org.nsca: . ack 3281234444 win 1026 <nop,nop,timestamp 1123456782 2123456376>
...
However, running `strace` on the nsca process seems to show that nsca is unable to open the Nagios command file. Note how it says "EBADF (Bad file descriptor)":

Code: Select all

...
munmap(0x2b1d8933e000, 4096)            = 0
mlock(0x120b5c30, 8)                    = 0
mlock(0x120b3ad0, 6272)                 = 0
mlock(0x120b31d0, 24)                   = 0
sendto(5, "\370\304\217i\373\341>>Zrx\240\201s\221\256\264\366p\232\34\263p\304\310=&(\272\2527\263"..., 132, 0, NULL, 0) = 132
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 2, -1) = 1 ([{fd=5, revents=POLLIN}])
recvfrom(5, "\r\352C\2312\342\351\232$;^\357\265n\322d\5\1\315N\n\376\24\7\22\17\6\336@\r\264\363"..., 720, 0, NULL, NULL) = 720
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1017, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1017, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1017, ...}) = 0
sendto(3, "<27>Jul  5 12:36:29 nsca[9879]: "..., 133, MSG_NOSIGNAL, NULL, 0) = 133
close(5)                                = 0
poll([{fd=4, events=POLLIN}, {fd=5, events=POLLIN}], 2, -1) = 1 ([{fd=5, revents=POLLNVAL}])
recvfrom(5, 0x7fff19011ea0, 720, 0, 0, 0) = -1 EBADF (Bad file descriptor)
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1017, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1017, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1017, ...}) = 0
sendto(3, "<27>Jul  5 12:36:29 nsca[9879]: "..., 52, MSG_NOSIGNAL, NULL, 0) = 52
munlock(0x120b5c30, 8)                  = 0
munlock(0x120b3ad0, 6272)               = 0
munlock(0x120b31d0, 24)                 = 0
close(5)                                = -1 EBADF (Bad file descriptor)
poll([{fd=4, events=POLLIN}], 1, -1 <unfinished ...>
Any idea what is going on?

-= Stefan
Last edited by stefanlasiewski on Tue Jul 10, 2012 12:22 pm, edited 1 time in total.
stefanlasiewski
Posts: 9
Joined: Fri Apr 20, 2012 3:24 pm

Re: nsca 2.7.2 not handling information from nsca-client 2.9

Post by stefanlasiewski »

The Nagios log (/var/log/nagios/nagios.log) does not show any errors, although I enabled the debug option in nsca.cfg .
Bah! So, it turns out that NSCA does not use the Nagios log! NSCA uses syslog, and therefore the debug messages are being written to syslog.

In /var/log/messages , I now see the error:

Code: Select all

Jul  6 16:02:11 nastmon nsca[9879]: Handling the connection...
Jul  6 16:02:11 nastmon nsca[9879]: Dropping packet with invalid CRC32 - possibly due to client using wrong password or crypto algorithm?
Jul  6 16:02:11 nastmon nsca[9879]: End of connection...
So, now I have an error. However, I am sure that I am using the correct password and crypto algoritm. According to some docs, the error "Dropping packet with invalid CRC32" is a very generic error. So, looks like I have a problem with my encryption, or something. This won't be easy to debug...
stefanlasiewski
Posts: 9
Joined: Fri Apr 20, 2012 3:24 pm

Re: nsca 2.7.2 not handling information from nsca-client 2.9

Post by stefanlasiewski »

It seems that existing Nagios servers which use nsca 2.7.2 will not work with clients which run nsca-client 2.9:

http://comments.gmane.org/gmane.network ... user/73098
The packet size was increased in 2.9 that caused 2.9 clients to fail
with older servers and 2.9 servers to fail with older clients. A fix was
implemented in the server in 2.9.1 that would allow older clients to
connect to it, but 2.9 and newer clients will not be able to talk to
older servers because they still use the larger packet size.
Locked