Page 1 of 4

Nagios 4.1.1 on Solaris 11 - Can't connect to query socket

Posted: Mon May 09, 2016 8:16 am
by dheitepriem
Hey guys,

I recently compiled Nagios 4.1.1 Core and Nagios Plugins 2.1.1 on Solaris 11.3 SPARC. The compilation went fine so far but when I start Nagios using the Service Management Facility (SMF) it goes from "offline*" to "maintenance" after a while with this error displayed in the logfile

Code: Select all

Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
The file itself is present and also read/writeable by the nagios user

ls -l /opt/nagios/var/rw/
total 2
-rw-r--r-- 1 nagios nagios 0 May 9 15:07 nagios.cmd
srw-rw---- 1 nagios nagios 0 May 9 15:09 nagios.qh

I'm able to access the webinterface but the output of "ps -ef | grep nagios" shows that there are a lot of defunctional processes

svcs -xv nagios:
State: offline* transitioning to online since May 9, 2016 03:08:00 PM CEST
Reason: Start method is running.
See: http://support.oracle.com/msg/SMF-8000-C4
See: /var/svc/log/application-nagios:default.log
Impact: This service is not running.

partitial output of logfile:
[ May 9 13:09:00 Executing start method ("/opt/nagios/bin/nagios /opt/nagios/etc/nagios.cfg"). ]
nerd: Channel hostchecks registered successfully
nerd: Channel servicechecks registered successfully
nerd: Channel opathchecks registered successfully
nerd: Fully initialized and ready to rock!
wproc: Successfully registered manager as @wproc with query handler
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
[...]
wproc: Registry request: name=Core Worker 22091;pid=22091
Error: Could not create external command file '/opt/nagios/var/rw/nagios.cmd' as named pipe: (17) -> File exists. If this file already exists and you are sure that another copy of Nagios is not running, you should delete this file.

ps -ef | grep nagios:
nagios 22164 22134 0 - ? 0:00 <defunct>
nagios 22165 22134 0 15:10:02 ? 0:00 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh

ps -ef | grep nagios | grep "<defunct>" | wc -l:
28

Thank you very much,
Daniel

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Mon May 09, 2016 12:26 pm
by tgriep
I found a previous post that describes the issue you are having. Take a look at is and see in it helps out.
https://support.nagios.com/forum/viewto ... 34&t=31304

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Tue May 10, 2016 2:41 am
by dheitepriem
Hi,

unfortunately it doesn't seem to solve the issue. I attached my nagios.cfg and also the nagios.log seems ok to me. This is the log output of the SMF command
[ May 10 09:28:10 Enabled. ]
[ May 10 09:28:10 Executing start method ("/opt/nagios/bin/nagios /opt/nagios/etc/nagios.cfg"). ]
nerd: Channel hostchecks registered successfully
nerd: Channel servicechecks registered successfully
nerd: Channel opathchecks registered successfully
nerd: Fully initialized and ready to rock!
wproc: Successfully registered manager as @wproc with query handler
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
[...]
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
wproc: Registry request: name=Core Worker 9337;pid=9337
wproc: Registry request: name=Core Worker 9336;pid=9336
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
wproc: Registry request: name=Core Worker 9338;pid=9338
wproc: Registry request: name=Core Worker 9339;pid=9339
wproc: Registry request: name=Core Worker 9340;pid=9340
wproc: Registry request: name=Core Worker 9366;pid=9366
wproc: Registry request: name=Core Worker 9367;pid=9367
wproc: Registry request: name=Core Worker 9368;pid=9368
wproc: Registry request: name=Core Worker 9371;pid=9371
Successfully launched command file worker with pid 9375
[ May 10 09:29:10 Method or service exit timed out. Killing contract 509. ]
[ May 10 09:29:10 Method "start" failed due to signal KILL. ]

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Tue May 10, 2016 1:53 pm
by lmiltchev
Can you run the following command and show the output?

Code: Select all

ls -ld /opt/nagios/var
Also, you have:
-rw-r--r-- 1 nagios nagios 0 May 9 15:07 nagios.cmd
I believe the permissions of the "nagios.cmd" need to be:

Code: Select all

prw-rw---- 1 nagios nagios 0 May 9 15:07 nagios.cmd

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Tue May 10, 2016 2:11 pm
by dheitepriem
Here are the results

Code: Select all

root@cs-monitor-brem-t:/opt/nagios/var/rw# ls -ld /opt/nagios/var/
drwxrwxr-x   5 nagios   nagios         8 May 10 09:29 /opt/nagios/var/
root@cs-monitor-brem-t:/opt/nagios/var/rw# ls -la
total 7
drwxrwsr-x   2 nagios   nagios         4 May 10 09:28 .
drwxrwxr-x   5 nagios   nagios         8 May 10 09:29 ..
prw-rw----   1 nagios   nagios         0 May 10 09:28 nagios.cmd
srw-rw----   1 nagios   nagios         0 May 10 09:28 nagios.qh
root@cs-monitor-brem-t:/opt/nagios/var/rw# pwd
/opt/nagios/var/rw
I deleted "nagios.cmd" and it was recreated by Nagios but unfortunately it still fails with the outputs I posted earlier.

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Tue May 10, 2016 2:40 pm
by tgriep
Can you run the following commands as root and post the output?

Code: Select all

grep nag /etc/passwd
grep nag /etc/group
ps -ef

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Tue May 10, 2016 4:19 pm
by dheitepriem
grep nag /etc/passwd and grep nag /etc/group didn't produce any output because we're managing our users and groups using LDAP. I write down the output of getent passwd and getent group instead.

getent passwd | grep nag:

Code: Select all

root@cs-monitor-brem-t:/opt# getent passwd | grep nag
nagios:x:1005:1005:nagios:/opt/nagios:/bin/sh
getent group | grep nag:

Code: Select all

root@cs-monitor-brem-t:/opt# getent group | grep nag
nagios::1005:nagios
ps -ef:

Code: Select all

root@cs-monitor-brem-t:/opt# ps -ef
     UID   PID  PPID   C    STIME TTY         TIME CMD
  netcfg  4240  2375   0 08:20:43 ?           0:01 /lib/inet/netcfgd
    root 12914  2375   0 23:05:48 pts/3       0:00 /usr/bin/login -z global -f root
    root  2375  2375   0 08:19:47 ?           0:00 zsched
    root  4196  2375   0 08:20:24 ?           0:16 /lib/svc/bin/svc.startd
    root  4294  2375   0 08:20:55 ?           0:00 /lib/svc/bin/svc.periodicd
    root  3834  2375   0 08:20:22 ?           0:00 /usr/sbin/init
    root  4199  2375   0 08:20:24 ?           1:30 /lib/svc/bin/svc.configd
  netadm  4289  2375   0 08:20:54 ?           0:01 /lib/inet/ipmgmtd
  daemon  4302  2375   0 08:20:55 ?           0:00 /lib/crypto/kcfd
    root  4452  2375   0 08:21:01 ?           0:00 /usr/lib/rad/rad -sp
    root  4332  2375   0 08:20:56 ?           0:02 /lib/inet/in.mpathd
  daemon  4455  2375   0 08:21:01 ?           0:00 /usr/lib/utmpd
    root  4324  2375   0 08:20:56 ?           0:00 /usr/lib/pfexecd
    root  4456  2375   0 08:21:01 ?           0:00 /usr/lib/rad/rad -sp
    root  4471  2375   0 08:21:01 ?           0:00 /usr/lib/dbus-daemon --system
  netadm  4618  2375   0 08:21:05 ?           0:01 /lib/inet/nwamd
    root  5005  2375   0 08:21:20 ?           0:04 /usr/lib/fm/fmd/fmd
  nagios 13116 13087   0        - ?           0:00 <defunct>
   smmsp  5126  2375   0 08:22:23 ?           0:01 /usr/lib/inet/sendmail -Ac -q15m
    root  5127  2375   0 08:22:23 ?           0:02 /usr/lib/inet/sendmail -bl -q15m
    root  5031  4196   0 08:21:22 console     0:00 /usr/sbin/ttymon -g -d /dev/console -l console -m ldterm,ttcompat -h -p cs-moni
    root  5024  2375   0 08:21:21 ?           0:00 /usr/lib/ssh/sshd
    root  5020  2375   0 08:21:21 ?           0:03 /usr/sbin/nscd
    root  4834  2375   0 08:21:13 ?           0:01 /usr/lib/ldap/ldap_cachemgr
    root  4853  2375   0 08:21:15 ?           0:00 /usr/sbin/cron
    root  4750  2375   0 08:21:09 ?           0:00 /usr/lib/zones/zoneproxy-client -s localhost:1008
    root  4996  2375   0 08:21:20 ?           0:00 /usr/lib/inet/in.ndpd
    root  5004  5002   0 08:21:20 ?           0:00 /usr/lib/autofs/automountd
    root  5035  2375   0 08:21:22 ?           0:00 /usr/sbin/syslogd
  daemon  4968  2375   0 08:21:19 ?           0:00 /usr/sbin/rpcbind
    root  5002  2375   0 08:21:20 ?           0:00 /usr/lib/autofs/automountd
    root  5001  2375   0 08:21:20 ?           0:01 /usr/lib/inet/inetd start
  nagios 13093 13087   0        - ?           0:00 <defunct>
  nagios 13096 13087   0        - ?           0:00 <defunct>
  nagios 13111 13087   0        - ?           0:00 <defunct>
    root  5074  2375   0 08:21:23 ?           0:01 /usr/lib/fm/notify/smtp-notify
  nagios 13105 13087   0        - ?           0:00 <defunct>
  nagios 13103 13087   0        - ?           0:00 <defunct>
  nagios 13099 13087   0        - ?           0:00 <defunct>
  nagios 13108 13087   0        - ?           0:00 <defunct>
  nagios 13112 13087   0        - ?           0:00 <defunct>
  nagios 13092 13087   0 23:10:47 ?           0:00 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
    root 12915 12914   0 23:05:48 pts/3       0:00 -bash
  nagios 13100 13087   0        - ?           0:00 <defunct>
  nagios 13098 13087   0        - ?           0:00 <defunct>
  nagios 13106 13087   0        - ?           0:00 <defunct>
  nagios 13087  4196   0 23:10:47 ?           0:00 /opt/nagios/bin/nagios /opt/nagios/etc/nagios.cfg
  nagios 13114 13087   0        - ?           0:00 <defunct>
  nagios 13115 13087   0        - ?           0:00 <defunct>
  nagios 13090 13087   0 23:10:47 ?           0:00 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
  nagios 13113 13087   0        - ?           0:00 <defunct>
  nagios 13094 13087   0        - ?           0:00 <defunct>
  nagios 13095 13087   0        - ?           0:00 <defunct>
  nagios 13117 13087   0        - ?           0:00 <defunct>
  nagios 13104 13087   0        - ?           0:00 <defunct>
  nagios 13102 13087   0        - ?           0:00 <defunct>
  nagios 13118 13087   0 23:10:47 ?           0:00 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
  nagios 13119 13087   0 23:10:47 ?           0:00 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
  nagios 13120 13087   0 23:10:47 ?           0:00 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
  nagios 13121 13087   0        - ?           0:00 <defunct>
  nagios 13122 13087   0 23:10:47 ?           0:00 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
  nagios 13123 13087   0        - ?           0:00 <defunct>
  nagios 13126 13087   0 23:10:50 ?           0:00 /opt/nagios/bin/nagios /opt/nagios/etc/nagios.cfg
    root 13139 12915   0 23:11:03 pts/3       0:00 ps -ef
  nagios 13089 13087   0 23:10:47 ?           0:00 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
  nagios 13097 13087   0        - ?           0:00 <defunct>
  nagios 13109 13087   0        - ?           0:00 <defunct>
  nagios 13107 13087   0        - ?           0:00 <defunct>
  nagios 13091 13087   0 23:10:47 ?           0:00 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
  nagios 13110 13087   0        - ?           0:00 <defunct>
  nagios 13101 13087   0        - ?           0:00 <defunct>
  nagios 13088 13087   0 23:10:47 ?           0:00 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Tue May 10, 2016 4:55 pm
by ssax
Did you follow a certain guide for installation on Solaris 11? If so, please post a link to the guide.

Depending on how you compiled it, here is how it's generally installed (on Linux though, I don't have Solaris):

Code: Select all

[root@localhost ~]# grep nag /etc/group
nagios:x:500:nagios,apache
nagcmd:x:501:nagios,apache

Code: Select all

[root@localhost ~]# ls -la /usr/local/nagios/var/rw
total 8
drwxrwsr-x 2 nagios nagcmd 4096 Mar 31 09:24 .
drwxrwxr-x 5 nagios nagios 4096 May 10 16:54 ..
prw-rw---- 1 nagios nagcmd    0 Mar 31 09:24 nagios.cmd
srw-rw---- 1 nagios nagcmd    0 Mar 31 09:23 nagios.qh

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Tue May 10, 2016 4:56 pm
by tgriep
Can you add the apache user to the nagios group and see if that fixes it?
Can you run this command on the Solaris system and post the output?

Code: Select all

svcs

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Tue May 10, 2016 11:10 pm
by dheitepriem
I followed the official guide of Oracle (http://www.oracle.com/technetwork/artic ... 79071.html) It may be that the issue is due to the "nagcmd" group- I will try to compile it again with options to make sure that only the user and group "nagios" is used.
The output from svcs is below

Code: Select all

STATE          STIME    FMRI
legacy_run      8:21:22 lrc:/etc/rc2_d/S89PRESERVE
online          8:20:29 svc:/system/early-manifest-import:default
online          8:20:29 svc:/system/svc/restarter:default
online          8:20:43 svc:/network/netcfg:default
online          8:20:43 svc:/network/sctp/congestion-control:highspeed
online          8:20:43 svc:/network/sctp/congestion-control:newreno
online          8:20:43 svc:/network/tcp/congestion-control:newreno
online          8:20:43 svc:/network/sctp/congestion-control:vegas
online          8:20:43 svc:/network/tcp/congestion-control:vegas
online          8:20:43 svc:/network/socket-config:default
online          8:20:43 svc:/network/tcp/congestion-control:cubic
online          8:20:43 svc:/network/sctp/congestion-control:cubic
online          8:20:44 svc:/network/tcp/congestion-control:highspeed
online          8:20:44 svc:/system/name-service/upgrade:default
online          8:20:51 svc:/network/datalink-management:default
online          8:20:52 svc:/system/filesystem/root:default
online          8:20:54 svc:/network/tcp/tcpkey:default
online          8:20:54 svc:/network/ip-interface-management:default
online          8:20:54 svc:/system/svc/periodic-restarter:default
online          8:20:54 svc:/system/cryptosvc:default
online          8:20:55 svc:/system/boot-archive:default
online          8:20:55 svc:/network/ipsec/ipsecalgs:default
online          8:20:55 svc:/system/filesystem/usr:default
online          8:20:55 svc:/network/loopback:default
online          8:20:56 svc:/system/pfexec:default
online          8:20:56 svc:/application/man-index:default
online          8:20:56 svc:/system/device/local:default
online          8:20:56 svc:/network/ipmp:default
online          8:20:56 svc:/milestone/devices:default
online          8:20:59 svc:/system/filesystem/minimal:default
online          8:21:00 svc:/system/environment:init
online          8:21:00 svc:/system/logadm-upgrade:default
online          8:21:00 svc:/system/pkgserv:default
online          8:21:00 svc:/system/rmtmpfiles:default
online          8:21:00 svc:/system/filesystem/uvfs-instclean:default
online          8:21:00 svc:/system/utmp:default
online          8:21:00 svc:/network/uucp-lock-cleanup:default
online          8:21:00 svc:/system/dbus:default
online          8:21:01 svc:/system/security/security-extensions:default
online          8:21:01 svc:/system/rad:local
online          8:21:01 svc:/system/rad:local-http
online          8:21:01 svc:/milestone/unconfig:default
online          8:21:01 svc:/milestone/config:default
online          8:21:02 svc:/system/ca-certificates:default
online          8:21:03 svc:/system/timezone:default
online          8:21:03 svc:/network/install:default
online          8:21:03 svc:/system/coreadm:default
online          8:21:03 svc:/network/physical:upgrade
online          8:21:04 svc:/system/config-user:default
online          8:21:04 svc:/network/location:upgrade
online          8:21:04 svc:/application/font/fc-cache:default
online          8:21:06 svc:/network/physical:default
online          8:21:07 svc:/system/identity:node
online          8:21:07 svc:/network/location:default
online          8:21:08 svc:/network/ipsec/policy:default
online          8:21:08 svc:/milestone/network:default
online          8:21:08 svc:/network/iptun:default
online          8:21:08 svc:/application/pkg/zones-proxy-client:default
online          8:21:08 svc:/network/initial:default
online          8:21:09 svc:/milestone/single-user:default
online          8:21:09 svc:/network/nfs/fedfs-client:default
online          8:21:09 svc:/network/netmask:default
online          8:21:10 svc:/network/nis/domain:default
online          8:21:10 svc:/system/name-service/switch:default
online          8:21:10 svc:/system/identity:domain
online          8:21:10 svc:/network/service:default
online          8:21:11 svc:/system/filesystem/local:default
online          8:21:12 svc:/system/filesystem/ufs/quota:default
online          8:21:13 svc:/network/ldap/client:default
online          8:21:13 svc:/network/shares:default
online          8:21:13 svc:/application/pkg/repositories-setup:default
online          8:21:14 svc:/milestone/name-services:default
online          8:21:14 svc:/system/auditset:default
online          8:21:14 svc:/system/cron:default
online          8:21:15 svc:/application/security/compliance:default
online          8:21:18 svc:/network/routing-setup:default
online          8:21:19 svc:/network/rpc/bind:default
online          8:21:19 svc:/network/routing/ndp:default
online          8:21:20 svc:/network/inetd:default
online          8:21:20 svc:/system/filesystem/autofs:default
online          8:21:20 svc:/system/name-service/cache:default
online          8:21:20 svc:/network/ssh:default
online          8:21:21 svc:/system/fmd:default
online          8:21:21 svc:/network/rpc/gss:default
online          8:21:21 svc:/network/rpc/smserver:default
online          8:21:21 svc:/milestone/self-assembly-complete:default
online          8:21:21 svc:/system/system-log:default
online          8:21:21 svc:/system/console-login:default
online          8:21:22 svc:/network/sendmail-client:default
online          8:21:22 svc:/network/smtp:sendmail
online          8:21:22 svc:/milestone/multi-user:default
online          8:21:23 svc:/system/fm/smtp-notify:default
online          8:21:23 svc:/milestone/multi-user-server:default
online         23:09:05 svc:/system/manifest-import:default
maintenance    23:11:46 svc:/application/nagios:default