System status Core service report down but are online

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

System status Core service report down but are online

Post by WillemDH »

We seem to have an issue we had in the past where the system status is reported as down for Core Services Elasticsearch Database and Logstash Collector. They are in fact online, so it could be related to apache not being able to execute "/etc/init.d/elasticsearch status"?

Which made me think it could be related to the sudoers file which contains this atm:

Code: Select all

# DEFAULTS
Defaults    requiretty
Defaults   !visiblepw
Defaults    always_set_home
Defaults    env_reset
Defaults    env_keep =  "COLORS DISPLAY HOSTNAME HISTSIZE INPUTRC KDEDIR LS_COLORS"
Defaults    env_keep += "MAIL PS1 PS2 QTDIR USERNAME LANG LC_ADDRESS LC_CTYPE"
Defaults    env_keep += "LC_COLLATE LC_IDENTIFICATION LC_MEASUREMENT LC_MESSAGES"
Defaults    env_keep += "LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE"
Defaults    env_keep += "LC_TIME LC_ALL LANGUAGE LINGUAS _XKB_CHARSET XAUTHORITY"
Defaults    secure_path = /sbin:/bin:/usr/sbin:/usr/bin
root    ALL=(ALL)       ALL

# NAGIOS
Defaults:nagios !requiretty
nagios ALL=NOPASSWD: /usr/local/nagios/libexec/check_init_service
nagios ALL=NOPASSWD: /usr/local/nagios/libexec/check_lin_service.sh
nagios ALL=NOPASSWD: /usr/local/nagios/libexec/check_lin_updates.py

# NAGIOS REACTOR
Defaults:sysreactor !requiretty
sysreactor ALL=NOPASSWD: /usr/bin/yum
sysreactor ALL=NOPASSWD: /sbin/reboot

# RUNDECK
Defaults:rundeck !requiretty
sys_rundeck_local ALL = NOPASSWD:/usr/bin/make
sys_rundeck_local ALL = NOPASSWD:/usr/bin/yum
sys_rundeck_local ALL = NOPASSWD:/sbin/reboot
sys_rundeck_local ALL = NOPASSWD:/sbin/shutdown

# NAGIOS LOG SERVER
User_Alias      NAGIOSLOGSERVER=nagios
User_Alias      NAGIOSLOGSERVERWEB=apache
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/logstash start
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/logstash stop
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/logstash restart
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/logstash reload
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/logstash status
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/elasticsearch start
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/elasticsearch stop
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/elasticsearch restart
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/elasticsearch reload
NAGIOSLOGSERVER ALL = NOPASSWD:/etc/init.d/elasticsearch status
NAGIOSLOGSERVER ALL = NOPASSWD:/usr/local/nagioslogserver/scripts/change_timezone.sh
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/logstash start
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/logstash stop
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/logstash restart
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/logstash reload
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/logstash status
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/elasticsearch start
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/elasticsearch stop
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/elasticsearch restart
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/elasticsearch reload
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/etc/init.d/elasticsearch status
NAGIOSLOGSERVERWEB ALL = NOPASSWD:/usr/local/nagioslogserver/scripts/get_logstash_ports.sh
Or could it be something else?

Willem
You do not have the required permissions to view the files attached to this post.
Nagios XI 5.8.1
https://outsideit.net
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: System status Core service report down but are online

Post by rkennedy »

Hi Willem!

What version are you using at this point? In 1.4.2, there were a few bugs fixed related to security that in turn changed how the sessions are stored.

Is this a single instance, or a cluster? Are you using a load balancer in front of the machine by chance?
Former Nagios Employee
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: System status Core service report down but are online

Post by WillemDH »

I'm running 2 nls nodes at version 1.4.2. No load balancer.
Nagios XI 5.8.1
https://outsideit.net
jspink
Posts: 43
Joined: Wed Nov 25, 2015 3:27 pm

Re: System status Core service report down but are online

Post by jspink »

Seems to be what I'm seeing as posted in this thread: https://support.nagios.com/forum/viewto ... 38&t=39556

we are at 10 hosts, behind a LB
Nagios Log Server: 10 Instances - 3,916,302,797 documents last check in 180 shards
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: System status Core service report down but are online

Post by rkennedy »

Got it. Your sudoers looks fine.

What are the permissions on these files?

Code: Select all

ls -la /var/run/logstash/logstash.pid
ls -la /var/run/elasticsearch/elasticsearch.pid
What do both of these calls return? (replace the IP accordingly, and https if needed)

Code: Select all

http://192.168.3.128/nagioslogserver/api/system/status?subsystem=elasticsearch
http://192.168.3.128/nagioslogserver/api/system/status?subsystem=logstash
Former Nagios Employee
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: System status Core service report down but are online

Post by WillemDH »

Here you go:

Code: Select all

ls -la /var/run/logstash/logstash.pid
-rw-r--r-- 1 root nagios 6 Aug  1 11:33 /var/run/logstash/logstash.pid

Code: Select all

ls -la /var/run/elasticsearch/elasticsearch.pid
-rw-r--r-- 1 nagios users 4 Jul 29 09:40 /var/run/elasticsearch/elasticsearch.pid
When I browse to the URL 'http://192.168.3.128/nagioslogserver/ap ... sticsearch' I get:

Code: Select all

{"status":"stopped","message":"Search engine (elasticsearch) is stopped."}
or

Code: Select all

{"status":"stopped","message":"Log collector (logstash) is stopped."}
Grtz

Willem
Nagios XI 5.8.1
https://outsideit.net
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: System status Core service report down but are online

Post by hsmith »

Let me see if I can dig around the source code and check a couple things. I have a busy morning scheduled, but I'll have some time this afternoon to look at it.
Former Nagios Employee.
me.
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: System status Core service report down but are online

Post by WillemDH »

Hello,

Any ideas I could try yet? I seem to be the only one reporting this? I really have the feeling it has something to do with visudo, but if you say it is fine.

Grtz

Willem
Nagios XI 5.8.1
https://outsideit.net
User avatar
WillemDH
Posts: 2320
Joined: Wed Mar 20, 2013 5:49 am
Location: Ghent
Contact:

Re: System status Core service report down but are online

Post by WillemDH »

Found the issue. When I omit 'Defaults requiretty' from /etc/sudoers it works again. So I re-entered 'Defaults requiretty' and tried adding 'Defaults:apache !requiretty' after which the two icons on the top of the page work, but not the ones on the system status page.
Nagios XI 5.8.1
https://outsideit.net
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: System status Core service report down but are online

Post by rkennedy »

Looking at a stock system, ###Defaults requiretty is commented out, and Defaults:nagios !requiretty is present. Could you test with this case?

Please post your entire sudoers file for us to look at.
Former Nagios Employee
Locked