Page 1 of 1

Service check did not exit properly

Posted: Wed Jun 10, 2015 3:44 pm
by karven
Im currently running nagios Version 4.0.8 under Freebsd 10.1-RELEASE, sometime I receive alert like saying that: Service check did not exit properly for the information I could gather nagios socket queue max length is too low and also nagios or check process are being kill somehow, Im looking away to fix this but I have no Idea yet :| :roll: . any help? :?:

Code: Select all

nagios@svrbsd:~ % nagios version

Nagios Core 4.0.8
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-12-2014
License: GPL

Code: Select all

nagios@svrbsd:~ % uname -a
FreeBSD svrbsd 10.1-RELEASE-p10 FreeBSD 10.1-RELEASE-p10 #0: Wed May 13 06:54:13 UTC 2015     [email protected]:/usr/obj/usr/src/sys/GENERIC  amd64

Code: Select all

nagios@svrdbs:~ % netstat -Lan
Current listen queue sizes (qlen/incqlen/maxqlen)
Proto Listen         Local Address         
tcp4  0/0/65535      72.10.166.19.80        
tcp4  0/0/10         *.587                  
tcp6  0/0/10         *.25                   
tcp4  0/0/10         *.25                   
tcp4  0/0/128        72.10.166.19.22        
unix  0/0/3          /var/spool/nagios/rw/nagios.qh
unix  0/0/1024       /tmp/spawnfcgi.sock
unix  0/0/65535      /tmp/phpfpm.sock
unix  0/0/4          /var/run/devd.pipe
unix  0/0/4          /var/run/devd.seqpacket.pipe

Code: Select all

Notification Type: PROBLEM

Service: Current Users
Host: Remotehost
Address: 192.168.10.100
State: CRITICAL

Date/Time: Wed Jun 10 16:01:24 AST 2015

Additional Info:

(Service check did not exit properly)

Code: Select all

root@svrbsd:/usr/pkg/etc/nagios/objects # tail /var/log/messages
Jun  8 14:08:57 svrbsd kernel: sonewconn: pcb 0xfffff8006a9f6c30: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
Jun  9 00:28:02 svrbsd kernel: pid 80355 (nagios), uid 1002: exited on signal 10
Jun  9 04:40:55 svrbsd kernel: pid 25287 (check_ping), uid 1002: exited on signal 11
Jun  9 15:59:29 svrbsd kernel: sonewconn: pcb 0xfffff8001f661960: Listen queue overflow: 5 already in queue awaiting acceptance (6 occurrences)
Jun  9 19:52:16 svrbsd kernel: sonewconn: pcb 0xfffff8001f444870: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
Jun 10 14:18:40 svrbsd kernel: sonewconn: pcb 0xfffff800a48690f0: Listen queue overflow: 5 already in queue awaiting acceptance (6 occurrences)
Jun 10 14:27:53 svrbsd kernel: sonewconn: pcb 0xfffff8001f93c870: Listen queue overflow: 5 already in queue awaiting acceptance (6 occurrences)
Jun 10 14:31:55 svrbsd kernel: sonewconn: pcb 0xfffff8001f46dd20: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
Jun 10 15:34:47 svrbsd kernel: sonewconn: pcb 0xfffff8001f936c30: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
Jun 10 15:39:01 svrbsd kernel: sonewconn: pcb 0xfffff8001f947a50: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
root@svrbsd:/usr/pkg/etc/nagios/objects # cat /var/log/messages|grep signal
May 30 22:19:26 svrbsd kernel: pid 65742 (check_http), uid 1002: exited on signal 11
May 30 22:21:25 svrbsd kernel: pid 66374 (check_http), uid 1002: exited on signal 11
Jun  1 03:21:00 svrbsd kernel: pid 35569 (check_http), uid 1002: exited on signal 11
Jun  1 03:22:59 svrbsd kernel: pid 36403 (check_http), uid 1002: exited on signal 11
Jun  1 03:25:00 svrbsd kernel: pid 37170 (check_http), uid 1002: exited on signal 11
Jun  1 03:25:00 svrbsd kernel: pid 37203 (check_http), uid 1002: exited on signal 11
Jun  1 03:27:00 svrbsd kernel: pid 37894 (check_http), uid 1002: exited on signal 11
Jun  1 03:29:00 svrbsd kernel: pid 38677 (check_http), uid 1002: exited on signal 11
Jun  1 03:31:00 svrbsd kernel: pid 39262 (check_http), uid 1002: exited on signal 11
Jun  1 03:33:00 svrbsd kernel: pid 40055 (check_http), uid 1002: exited on signal 11
Jun  2 19:25:46 svrbsd kernel: pid 31301 (nagios), uid 1002: exited on signal 11
Jun  2 19:27:12 svrbsd kernel: pid 31299 (nagios), uid 1002: exited on signal 11
Jun  3 05:40:58 svrbsd kernel: pid 7876 (nagios), uid 1002: exited on signal 10
Jun  3 06:58:01 svrbsd kernel: pid 7875 (nagios), uid 1002: exited on signal 11
Jun  3 11:24:19 svrbsd kernel: pid 7877 (nagios), uid 1002: exited on signal 11
Jun  3 11:24:21 svrbsd kernel: pid 7879 (nagios), uid 1002: exited on signal 10
Jun  3 11:24:25 svrbsd kernel: pid 7878 (nagios), uid 1002: exited on signal 10
Jun  3 18:13:47 svrbsd syslogd: exiting on signal 15
Jun  4 00:33:31 svrbsd kernel: pid 4157 (nagios), uid 1002: exited on signal 11
Jun  9 00:28:02 svrbsd kernel: pid 80355 (nagios), uid 1002: exited on signal 10
Jun  9 04:40:55 svrbsd kernel: pid 25287 (check_ping), uid 1002: exited on signal 11

Code: Select all

bge1: link state changed to UP
sonewconn: pcb 0xfffff8001f445e10: Listen queue overflow: 5 already in queue awaiting acceptance (1 occurrences)
sonewconn: pcb 0xfffff8001f7ebd20: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
ugen0.3: <CHICONY> at usbus0 (disconnected)
ukbd0: at uhub3, port 4, addr 3 (disconnected)
pid 4157 (nagios), uid 1002: exited on signal 11
sonewconn: pcb 0xfffff800a4869000: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
sonewconn: pcb 0xfffff8001f9434b0: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
sonewconn: pcb 0xfffff8001f943780: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
sonewconn: pcb 0xfffff8001f7af000: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
sonewconn: pcb 0xfffff8001f7eb870: Listen queue overflow: 5 already in queue awaiting acceptance (14 occurrences)
sonewconn: pcb 0xfffff8001f48d870: Listen queue overflow: 5 already in queue awaiting acceptance (13 occurrences)
sonewconn: pcb 0xfffff8001f460000: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
sonewconn: pcb 0xfffff8001f9363c0: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
sonewconn: pcb 0xfffff8001f93c4b0: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
sonewconn: pcb 0xfffff8001f9474b0: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
sonewconn: pcb 0xfffff802a1aa0e10: Listen queue overflow: 5 already in queue awaiting acceptance (6 occurrences)
sonewconn: pcb 0xfffff8006a9f6c30: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
pid 80355 (nagios), uid 1002: exited on signal 10
pid 25287 (check_ping), uid 1002: exited on signal 11
sonewconn: pcb 0xfffff8001f661960: Listen queue overflow: 5 already in queue awaiting acceptance (6 occurrences)
sonewconn: pcb 0xfffff8001f444870: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
sonewconn: pcb 0xfffff800a48690f0: Listen queue overflow: 5 already in queue awaiting acceptance (6 occurrences)
sonewconn: pcb 0xfffff8001f93c870: Listen queue overflow: 5 already in queue awaiting acceptance (6 occurrences)
sonewconn: pcb 0xfffff8001f46dd20: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
sonewconn: pcb 0xfffff8001f936c30: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)
sonewconn: pcb 0xfffff8001f947a50: Listen queue overflow: 5 already in queue awaiting acceptance (7 occurrences)

Re: Service check did not exit properly

Posted: Wed Jun 10, 2015 3:56 pm
by jdalrymple
How heavily loaded is this system - e.g. what is the system load? How many hosts/services are you monitoring?

Here is the FreeBSD tuning guide heavily loaded systems: https://www.freebsd.org/doc/handbook/co ... imits.html

I suggest starting with kern.ipc.somaxconn - but there may be other parameters to look at. Alternatively, if this isn't a heavily loaded system then something else must be wrong.

Re: Service check did not exit properly

Posted: Thu Jun 11, 2015 9:17 am
by karven
I have try those already, but do you guy have anything related like this https://github.com/NagiosEnterprises/nr ... 33add157f4, please let me know.
/boot/loader.conf

Code: Select all

# More tuning
kern.ipc.msgmax="65536"
kern.ipc.msgmnb="65536"
kern.maxusers="2048"
/etc/sysctl.conf

Code: Select all

# Increase TCP Window size to 64K for increase in network performance
net.inet.tcp.sendspace: 65536
net.inet.tcp.recvspace: 65536
net.local.stream.sendspace=65536
net.local.stream.recvspace=65536

# Other
net.inet.icmp.maskrepl=0
net.inet.tcp.path_mtu_discovery=1
net.inet.tcp.sack.enable=1
net.inet.icmp.icmplim=1000
net.inet.tcp.syncookies=1
net.inet.ip.fw.dyn_max=16384
kern.ipc.soacceptqueue=20480
# We might consider enabling this:
net.inet.tcp.fast_finwait2_recycle=1
# This value is in milliseconds
net.inet.tcp.finwait2_timeout=30000

# kernel/memory tuning
kern.ipc.shmmax=68719476736
kern.ipc.shmall=16777216
kern.ipc.shm_use_phys=1
kern.threads.max_threads_per_proc=16384

# More kernel tuning
kern.ipc.shmall=4294967296
kern.ipc.somaxconn=65535
net.inet.ip.intr_queue_maxlen=10240

Re: Service check did not exit properly

Posted: Thu Jun 11, 2015 10:47 am
by ssax
After adding the sysctl.conf entries did you reboot or run:

Code: Select all

/etc/rc.d/sysctl start
?

Re: Service check did not exit properly

Posted: Thu Jun 11, 2015 1:20 pm
by karven
Yes I did, and those were set before installing nagios. Do you have anything related to the patch on nagios https://github.com/NagiosEnterprises/nr ... 33add157f4 I think nagios is missing backlog call on FreeBSD those change is on nrpe 2.16 already.

Re: Service check did not exit properly

Posted: Thu Jun 11, 2015 1:49 pm
by jdalrymple
There is no runtime configurable option. Here is the code for lib/nsock.c:

Code: Select all

int nsock_unix(const char *path, unsigned int flags)
{
        struct sockaddr_un saun;
        struct sockaddr *sa;
        int sock = 0, mode;
        socklen_t slen;

        if(!path)
                return NSOCK_EINVAL;

        if(flags & NSOCK_TCP)
                mode = SOCK_STREAM;
        else if(flags & NSOCK_UDP)
                mode = SOCK_DGRAM;
        else
                return NSOCK_EINVAL;

        if((sock = socket(AF_UNIX, mode, 0)) < 0) {
                return NSOCK_ESOCKET;
        }

        /* set up the sockaddr_un struct and the socklen_t */
        sa = (struct sockaddr *)&saun;
        memset(&saun, 0, sizeof(saun));
        saun.sun_family = AF_UNIX;
        slen = strlen(path);
        memcpy(&saun.sun_path, path, slen);
        slen += offsetof(struct sockaddr_un, sun_path);

        /* unlink if we're supposed to, but not if we're connecting */
        if(flags & NSOCK_UNLINK && !(flags & NSOCK_CONNECT)) {
                if(unlink(path) < 0 && errno != ENOENT)
                        return NSOCK_EUNLINK;
        }

        if(flags & NSOCK_CONNECT) {
                if(connect(sock, sa, slen) < 0) {
                        close(sock);
                        return NSOCK_ECONNECT;
                }
                return sock;
        } else {
                if(bind(sock, sa, slen) < 0) {
                        close(sock);
                        return NSOCK_EBIND;
                }
        }

        if(!(flags & NSOCK_BLOCK) && fcntl(sock, F_SETFL, O_NONBLOCK) < 0)
                return NSOCK_EFCNTL;

        if(flags & NSOCK_UDP)
                return sock;

        if(listen(sock, 3) < 0) {
                close(sock);
                return NSOCK_ELISTEN;
        }

        return sock;
}
Here is the libc function for creating a socket:

Code: Select all

SYNOPSIS
     #include <sys/types.h>
     #include <sys/socket.h>

     int
     listen(int	s, int backlog);
If you replace the line

Code: Select all

        if(listen(sock, 3) < 0) {
with

Code: Select all

        if(listen(sock, 128) < 0) {
And recompile you should have a larger queue on that socket. Let us know.

Re: Service check did not exit properly

Posted: Thu Jun 11, 2015 3:37 pm
by karven
I have update and recompile, If you guys could made a patch for FreeBSD in your new version that will be great, queue length is very low I think the default value should be a bit higher, 1K is Ok for me.
Thanks, I really appreciate your help.

Re: Service check did not exit properly

Posted: Fri Jun 12, 2015 9:17 am
by tmcdonald
That could possibly be done - would you mind opening a separate issue on GitHub for this? That way the issue will be properly filed and won't fall through the cracks.