Page 1 of 2

System status Core service report down but are online

Posted: Fri Jul 29, 2016 3:48 am
by WillemDH
We seem to have an issue we had in the past where the system status is reported as down for Core Services Elasticsearch Database and Logstash Collector. They are in fact online, so it could be related to apache not being able to execute "/etc/init.d/elasticsearch status"?

Which made me think it could be related to the sudoers file which contains this atm:

Code: Select all

# DEFAULTS
Defaults    requiretty
Defaults   !visiblepw
Defaults    always_set_home
Defaults    env_reset
Defaults    env_keep =  "COLORS DISPLAY HOSTNAME HISTSIZE INPUTRC KDEDIR LS_COLORS"
Defaults    env_keep += "MAIL PS1 PS2 QTDIR USERNAME LANG LC_ADDRESS LC_CTYPE"
Defaults    env_keep += "LC_COLLATE LC_IDENTIFICATION LC_MEASUREMENT LC_MESSAGES"
Defaults    env_keep += "LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE"
Defaults    env_keep += "LC_TIME LC_ALL LANGUAGE LINGUAS _XKB_CHARSET XAUTHORITY"
Defaults    secure_path = /sbin:/bin:/usr/sbin:/usr/bin
root    ALL=(ALL)       ALL

# NAGIOS
Defaults:nagios !requiretty
nagios ALL=NOPASSWD: /usr/local/nagios/libexec/check_init_service
nagios ALL=NOPASSWD: /usr/local/nagios/libexec/check_lin_service.sh
nagios ALL=NOPASSWD: /usr/local/nagios/libexec/check_lin_updates.py

# NAGIOS REACTOR
Defaults:sysreactor !requiretty
sysreactor ALL=NOPASSWD: /usr/bin/yum
sysreactor ALL=NOPASSWD: /sbin/reboot

# RUNDECK
Defaults:rundeck !requiretty
sys_rundeck_local ALL = NOPASSWD:/usr/bin/make
sys_rundeck_local ALL = NOPASSWD:/usr/bin/yum
sys_rundeck_local ALL = NOPASSWD:/sbin/reboot
sys_rundeck_local ALL = NOPASSWD:/sbin/shutdown

# NAGIOS LOG SERVER
User_Alias      NAGIOSLOGSERVER=nagios
User_Alias      NAGIOSLOGSERVERWEB=apache
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/logstash start
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/logstash stop
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/logstash restart
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/logstash reload
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/logstash status
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/elasticsearch start
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/elasticsearch stop
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/elasticsearch restart
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/elasticsearch reload
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/elasticsearch status
NAGIOSLOGSERVER ALL = NOPASSWD:/usr/local/nagioslogserver/scripts/change_timezone.sh
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/logstash start
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/logstash stop
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/logstash restart
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/logstash reload
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/logstash status
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/elasticsearch start
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/elasticsearch stop
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/elasticsearch restart
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/elasticsearch reload
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/elasticsearch status
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/usr/local/nagioslogserver/scripts/get_logstash_ports.sh
Or could it be something else?

Willem

Re: System status Core service report down but are online

Posted: Fri Jul 29, 2016 9:52 am
by rkennedy
Hi Willem!

What version are you using at this point? In 1.4.2, there were a few bugs fixed related to security that in turn changed how the sessions are stored.

Is this a single instance, or a cluster? Are you using a load balancer in front of the machine by chance?

Re: System status Core service report down but are online

Posted: Fri Jul 29, 2016 4:37 pm
by WillemDH
I'm running 2 nls nodes at version 1.4.2. No load balancer.

Re: System status Core service report down but are online

Posted: Mon Aug 01, 2016 9:54 am
by jspink
Seems to be what I'm seeing as posted in this thread: https://support.nagios.com/forum/viewto ... 38&t=39556

we are at 10 hosts, behind a LB

Re: System status Core service report down but are online

Posted: Mon Aug 01, 2016 10:21 am
by rkennedy
Got it. Your sudoers looks fine.

What are the permissions on these files?

Code: Select all

ls -la /var/run/logstash/logstash.pid
ls -la /var/run/elasticsearch/elasticsearch.pid
What do both of these calls return? (replace the IP accordingly, and https if needed)

Code: Select all

http://192.168.3.128/nagioslogserver/api/system/status?subsystem=elasticsearch
http://192.168.3.128/nagioslogserver/api/system/status?subsystem=logstash

Re: System status Core service report down but are online

Posted: Tue Aug 02, 2016 2:24 am
by WillemDH
Here you go:

Code: Select all

ls -la /var/run/logstash/logstash.pid
-rw-r--r-- 1 root nagios 6 Aug  1 11:33 /var/run/logstash/logstash.pid

Code: Select all

ls -la /var/run/elasticsearch/elasticsearch.pid
-rw-r--r-- 1 nagios users 4 Jul 29 09:40 /var/run/elasticsearch/elasticsearch.pid
When I browse to the URL 'http://192.168.3.128/nagioslogserver/ap ... sticsearch' I get:

Code: Select all

{"status":"stopped","message":"Search engine (elasticsearch) is stopped."}
or

Code: Select all

{"status":"stopped","message":"Log collector (logstash) is stopped."}
Grtz

Willem

Re: System status Core service report down but are online

Posted: Tue Aug 02, 2016 9:31 am
by hsmith
Let me see if I can dig around the source code and check a couple things. I have a busy morning scheduled, but I'll have some time this afternoon to look at it.

Re: System status Core service report down but are online

Posted: Fri Aug 05, 2016 2:54 pm
by WillemDH
Hello,

Any ideas I could try yet? I seem to be the only one reporting this? I really have the feeling it has something to do with visudo, but if you say it is fine.

Grtz

Willem

Re: System status Core service report down but are online

Posted: Sun Aug 07, 2016 4:49 am
by WillemDH
Found the issue. When I omit 'Defaults requiretty' from /etc/sudoers it works again. So I re-entered 'Defaults requiretty' and tried adding 'Defaults:apache !requiretty' after which the two icons on the top of the page work, but not the ones on the system status page.

Re: System status Core service report down but are online

Posted: Mon Aug 08, 2016 9:49 am
by rkennedy
Looking at a stock system, ###Defaults requiretty is commented out, and Defaults:nagios !requiretty is present. Could you test with this case?

Please post your entire sudoers file for us to look at.