Page 3 of 4

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Thu May 12, 2016 10:42 am
by tgriep
Can you login to the system as the nagios user and run nagios manually to see if it gives is any more information
Run this as the nagios user.

Code: Select all

/opt/nagios/bin/nagios  /opt/nagios/etc/nagios.cfg
I have my Solaris 10 system mostly configured and is seems to be working but I am not finished setting it up all of the way.

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Thu May 12, 2016 12:06 pm
by dheitepriem
Of course I can do it. This is the output

Code: Select all

nagios@cs-monitor-brem-t:~$ id                                                                                                                                                                                                               
uid=1005(nagios) gid=1005(nagios)
nagios@cs-monitor-brem-t:~$ pwd                                                                                                                                                                                                              
/opt/nagios
nagios@cs-monitor-brem-t:~$ /opt/nagios/bin/nagios /opt/nagios/etc/nagios.cfg                                                                                                                                                                

Nagios Core 4.1.1
Copyright (c) 2009-present Nagios Core Development Team and Community Contributors
Copyright (c) 1999-2009 Ethan Galstad
Last Modified: 08-19-2015
License: GPL

Website: https://www.nagios.org
Nagios 4.1.1 starting... (PID=25110)
Local time is Thu May 12 19:00:52 CEST 2016
nerd: Channel hostchecks registered successfully
nerd: Channel servicechecks registered successfully
nerd: Channel opathchecks registered successfully
nerd: Fully initialized and ready to rock!
wproc: Successfully registered manager as @wproc with query handler
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
wproc: Registry request: name=Core Worker 25111;pid=25111
wproc: Registry request: name=Core Worker 25112;pid=25112
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
wproc: Registry request: name=Core Worker 25113;pid=25113
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
wproc: Registry request: name=Core Worker 25114;pid=25114
wproc: Registry request: name=Core Worker 25115;pid=25115
wproc: Registry request: name=Core Worker 25141;pid=25141
wproc: Registry request: name=Core Worker 25142;pid=25142
wproc: Registry request: name=Core Worker 25143;pid=25143
wproc: Registry request: name=Core Worker 25145;pid=25145
Successfully launched command file worker with pid 25147
Does Nagios listen to any specific port? Because I really don't know why it tells me "Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused" given the fact that the file is present and also read/writeable by Nagios (user and group)

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Thu May 12, 2016 3:06 pm
by tgriep
The nagios.qh file is a Unix-domain socket that was added to Nagios 4.x.x as a mechanism so the Nagios workers can talk to Core and also adding the ability for external process to talk to core.
Some links that you can look at.
https://labs.nagios.com/tag/query-handl ... tsnew.html

The only thing I can find is a permission issue that causes that error.
Your permissions to that file is setup correctly but the only difference that Is that you are using LDAP and and I am not.
Can you use a local account to see if that fixes it for you.

I talked with the developer and will be working in Core to get better support for Solaris in the future releases.

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Thu May 12, 2016 3:38 pm
by dheitepriem
Thank you very much for the links and especially your help. I will try it without LDAP tomorrow (European time)

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Thu May 12, 2016 3:48 pm
by tgriep
Let us know how it works for you.

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Fri May 13, 2016 1:45 am
by dheitepriem
Unfortunately it doesn't work. LDAP is disabled, a local nagios account and group have been created and the apache user has been added to this nagios group.

Code: Select all

root@cs-monitor-brem-t:/opt# svcs -xv nagios
svc:/application/nagios:default (?)
 State: maintenance since May 13, 2016 08:37:44 AM CEST
Reason: Start method failed repeatedly, last died on Killed (9).
   See: http://support.oracle.com/msg/SMF-8000-KS
   See: /var/svc/log/application-nagios:default.log
Impact: This service is not running.
root@cs-monitor-brem-t:/opt# svcs -xv ldap/client
svc:/network/ldap/client:default (LDAP Name Service Client)
 State: disabled since May 13, 2016 08:32:53 AM CEST
Reason: Disabled by an administrator.
   See: http://support.oracle.com/msg/SMF-8000-05
   See: man -M /usr/share/man -s 1M ldap_cachemgr
   See: /var/svc/log/network-ldap-client:default.log
Impact: This service is not running.
root@cs-monitor-brem-t:/opt# date
Friday, May 13, 2016 08:38:08 AM CEST
root@cs-monitor-brem-t:/opt# tail -20 /var/svc/log/application-nagios:default.log
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
wproc: Registry request: name=Core Worker 28197;pid=28197
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
wproc: Registry request: name=Core Worker 28198;pid=28198
wproc: Registry request: name=Core Worker 28199;pid=28199
Failed to connect to query socket '/opt/nagios/var/rw/nagios.qh': connect() failed: Connection refused
wproc: Registry request: name=Core Worker 28200;pid=28200
wproc: Registry request: name=Core Worker 28201;pid=28201
wproc: Registry request: name=Core Worker 28226;pid=28226
wproc: Registry request: name=Core Worker 28227;pid=28227
wproc: Registry request: name=Core Worker 28230;pid=28230
wproc: Registry request: name=Core Worker 28232;pid=28232
Successfully launched command file worker with pid 28234
[ May 13 08:37:44 Method or service exit timed out.  Killing contract 1822. ]
[ May 13 08:37:44 Method "start" failed due to signal KILL. ]
P.S.: Due to a public holiday here in Germany I can't take any actions on our server till Tuesday

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Fri May 13, 2016 11:04 am
by tgriep
Maybe there was a problem compiling it for the sockets or a library compatibility issue.
Can you post the following files from the source folder so we can review it?

Code: Select all

config.log
config.status

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Mon May 16, 2016 10:08 am
by dheitepriem
Please find attached both files

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Mon May 16, 2016 10:49 am
by tgriep
Thanks for the log files, Nothing jumps out as an error of an issue.

One thing I found in the instructions for Generating a Service Management Facility Manifest is that they left out the -d to the command that starts up nagios.
It is needed to start up nagios in daemon mode.

Recreate and reload the nagios.xml file like the example below and see if that helps.

Code: Select all

svcbundle -o nagios.xml -s service-name=application/nagios -s start-method="/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg"
Try that and see if it helps.

Re: Nagios 4.1.1 on Solaris 11 - Can't connect to query sock

Posted: Mon May 16, 2016 11:24 am
by dheitepriem
I added the "-d" parameter and now it starts successfully (according to SMF). The "<defunct>" processes are still there but I don't know if they will influence the program in any way. I will monitor the functionality of Nagios over this week and will get back to you if I encounter any issues.

Code: Select all

svcs -xv nagios
svc:/application/nagios:default (?)
 State: online since May 16, 2016 06:11:38 PM CEST
   See: /var/svc/log/application-nagios:default.log
Impact: None.

Code: Select all

ps -ef | grep nagios
  nagios 29525 29492   0 18:11:39 ?           1:00 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
  nagios 29508 29492   0        - ?           0:00 <defunct>
  nagios 29512 29492   0        - ?           0:00 <defunct>
  nagios 29502 29492   0        - ?           0:00 <defunct>
    root 29607 29601   0 18:18:30 pts/2       0:00 grep nagios
  nagios 29523 29492   0 18:11:39 ?           0:52 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
  nagios 29509 29492   0        - ?           0:00 <defunct>
  nagios 29513 29492   0        - ?           0:00 <defunct>
  nagios 29526 29492   0 18:11:39 ?           0:56 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
  nagios 29511 29492   0        - ?           0:00 <defunct>
  nagios 29529 29492   0 18:11:42 ?           0:00 /opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg
  nagios 29501 29492   0        - ?           0:00 <defunct>
  nagios 29524 29492   0        - ?           0:00 <defunct>
  nagios 29518 29492   0        - ?           0:00 <defunct>
  nagios 29499 29492   0        - ?           0:00 <defunct>
  nagios 29507 29492   0        - ?           0:00 <defunct>
  nagios 29516 29492   0        - ?           0:00 <defunct>
  nagios 29517 29492   0        - ?           0:00 <defunct>
  nagios 29522 29492   0        - ?           0:00 <defunct>
  nagios 29495 29492   0 18:11:39 ?           1:05 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
  nagios 29519 29492   0        - ?           0:00 <defunct>
  nagios 29494 29492   0 18:11:39 ?           2:00 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
  nagios 29498 29492   0 18:11:39 ?           1:05 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
  nagios 29504 29492   0        - ?           0:00 <defunct>
  nagios 29520 29492   0        - ?           0:00 <defunct>
  nagios 29514 29492   0        - ?           0:00 <defunct>
  nagios 29505 29492   0        - ?           0:00 <defunct>
  nagios 29528 29492   0        - ?           0:00 <defunct>
  nagios 29521 29492   0        - ?           0:00 <defunct>
  nagios 29497 29492   0        - ?           0:00 <defunct>
  nagios 29493 29492   0 18:11:39 ?           1:26 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
  nagios 29506 29492   0        - ?           0:00 <defunct>
  nagios 29503 29492   0        - ?           0:00 <defunct>
  nagios 29515 29492   0        - ?           0:00 <defunct>
  nagios 29496 29492   1 18:11:39 ?           2:00 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
  nagios 29510 29492   0        - ?           0:00 <defunct>
  nagios 29500 29492   0        - ?           0:00 <defunct>
  nagios 29527 29492   0 18:11:39 ?           0:26 /opt/nagios/bin/nagios --worker /opt/nagios/var/rw/nagios.qh
  nagios 29492 25337   0 18:11:39 ?           0:00 /opt/nagios/bin/nagios -d /opt/nagios/etc/nagios.cfg
Nagios log

Code: Select all

[1463415098] Nagios 4.1.1 starting... (PID=29492)
[1463415098] Local time is Mon May 16 18:11:38 CEST 2016
[1463415098] LOG VERSION: 2.0
[1463415098] qh: Socket '/opt/nagios/var/rw/nagios.qh' successfully initialized
[1463415098] qh: core query handler registered
[1463415098] nerd: Channel hostchecks registered successfully
[1463415098] nerd: Channel servicechecks registered successfully
[1463415098] nerd: Channel opathchecks registered successfully
[1463415098] nerd: Fully initialized and ready to rock!
[1463415098] wproc: Successfully registered manager as @wproc with query handler
[1463415098] wproc: Registry request: name=Core Worker 29493;pid=29493
[1463415098] wproc: Registry request: name=Core Worker 29494;pid=29494
[1463415098] wproc: Registry request: name=Core Worker 29495;pid=29495
[1463415098] wproc: Registry request: name=Core Worker 29496;pid=29496
[1463415098] wproc: Registry request: name=Core Worker 29498;pid=29498
[1463415098] wproc: Registry request: name=Core Worker 29523;pid=29523
[1463415098] wproc: Registry request: name=Core Worker 29525;pid=29525
[1463415098] wproc: Registry request: name=Core Worker 29526;pid=29526
[1463415098] wproc: Registry request: name=Core Worker 29527;pid=29527
[1463415102] Successfully launched command file worker with pid 29529