Instance Status

floki · Post by **floki** » Thu Feb 07, 2019 8:47 am

Good Day!

I just notice this one when I was checking instance status of our Nagios Log Server Cluster. It seems that Elasticsearch & logstash are down on the other instance of the cluster. There's no logs on the that instance both elasticsearch.log & logstash.log. I can also access web gui of both instances. When I read about adding cluster in documentation, I saw that both instances should have green checks in both Elasticsearch & logstash so I assume that this issue is not expected. May I know what are the steps to troubleshoot this one?

Here's the screenshot for reference:
https://drive.google.com/file/d/1lEAKGm ... sp=sharing

Thanks a lot!

scottwilkerson · Post by **scottwilkerson** » Thu Feb 07, 2019 1:30 pm

It is possible that cron isn't running on the server with the red ! on them, can you ssh to that server and run the following and return the results

Code: Select all

chage -l nagios
service crond status
ps -ef|grep java

floki · Post by **floki** » Sat Feb 09, 2019 4:29 am

Good Day! Thanks for the response

Here's the result of the commands for our two instance cluster:

Instance 1:
https://drive.google.com/file/d/1YEy0Na ... sp=sharing

Instance 2:
https://drive.google.com/file/d/1L535JP ... sp=sharing

Both seems to have the same output. hmmmm and I kept noticing the PAM ERROR...under the service crond.service.

Thanks a lot.

scottwilkerson · Post by **scottwilkerson** » Mon Feb 11, 2019 7:47 am

The nagios user is expired on both systems. Run the following on each system to disable password expiration:

Code: Select all

chage -I -1 -m 0 -M 99999 -E -1 nagios

floki · Post by **floki** » Sat Feb 16, 2019 10:37 pm

Hi!

I tested the command you've given though its still showing the same instance status. I also conduct some further testing to see how it works:

Code: Select all

Scenario:
	1. NODE_1 : master node
	2. NODE_2 : slave node
	3. nagios core pointed to NODE_1
	4. indexes are replicated to NODE_2

TEST 1:
	1. NODE_1 : stop elasticsearch & logstash
	2. NODE_2 : standalone node
	3. Only logs coming from load balancers was received
	4. After elasticsearch & logstash started at NODE_1, NODE_2 is now the master node
	check: curl 'localhost:9200/_cat/master?v'
	5. Logs received by NODE_2 was replicated to NODE_1
	check: curl -s -XGET http://localhost:9200/_cat/shards?v
	6. Cluster status returned from Yellow to Green
	7. Syslog received by NODE_2 during down time of NODE_1 were copied to NODE_1
	8. NODE_1 can still receive logs even though shifted to slave

Maybe the status are normal?

scottwilkerson · Post by **scottwilkerson** » Mon Feb 18, 2019 9:30 am

Did you run the command on all of the nodes?

floki · Post by **floki** » Tue Feb 19, 2019 1:43 am

Yes I did but still the same instance status. Is it also normal to have master/data in the cluster?

Thanks a lot

scottwilkerson · Post by **scottwilkerson** » Tue Feb 19, 2019 7:39 am

floki wrote:Yes I did but still the same instance status. Is it also normal to have master/data in the cluster?

Thanks a lot

Either node can be master, that can change at any time.

What the command I gave was supposed to fix the errors you were seeing in the screenshots you sent here
https://support.nagios.com/forum/viewto ... 09#p274749

floki · Post by **floki** » Tue Feb 26, 2019 9:53 pm

The red instance status for the second Log Server still exists. This is the result after I input the command you've given. Is there any other way? Though when I point logs to the second Log server with red instance status, it can receive logs.

Code: Select all

[root@Log_Server elasticsearch]# chage -l nagios
Last password change                                    : Aug 23, 2018
Password expires                                        : never
Password inactive                                       : never
Account expires                                         : never
Minimum number of days between password change          : 0
Maximum number of days between password change          : 99999
Number of days of warning before password expires       : 7
[root@Log_Server elasticsearch]# service crond status
Redirecting to /bin/systemctl status crond.service
● crond.service - Command Scheduler
   Loaded: loaded (/usr/lib/systemd/system/crond.service; enabled; vendor preset: enabled)
   Active: active (running) since Tue 2018-10-30 19:05:58 HKT; 3 months 28 days ago
 Main PID: 725 (crond)
   CGroup: /system.slice/crond.service
           └─725 /usr/sbin/crond -n

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
[root@Log_Server elasticsearch]# ps -ef|grep java
root       6593   5697  0 10:10 pts/0    00:00:00 grep --color=auto java
nagios    56592      1  8 Feb15 ?        23:00:05 /bin/java -Xms3983m -Xmx3983m -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Des.cluster.name=8e37c562-7430-4a09-95ec-24e7144f3d25 -Des.node.name=efc78a82-f33a-4f5f-8ffa-13228247b3bb -Des.discovery.zen.ping.unicast.hosts=localhost,10.109.80.8,10.109.80.9 -Des.path.repo=/ -Delasticsearch -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid -Des.path.home=/usr/local/nagioslogserver/elasticsearch -cp :/usr/local/nagioslogserver/elasticsearch/lib/elasticsearch-1.7.6.jar:/usr/local/nagioslogserver/elasticsearch/lib/*:/usr/local/nagioslogserver/elasticsearch/lib/sigar/* -Des.default.path.home=/usr/local/nagioslogserver/elasticsearch -Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/usr/local/nagioslogserver/elasticsearch/data -Des.default.path.work=/usr/local/nagioslogserver/tmp/elasticsearch -Des.default.path.conf=/usr/local/nagioslogserver/elasticsearch/config org.elasticsearch.bootstrap.Elasticsearch
root      72960  72958  1 Feb26 ?        00:17:56 /bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -Xmx500m -Xss2048k -Djffi.boot.library.path=/usr/local/nagioslogserver/logstash/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Djava.io.tmpdir=/usr/local/nagioslogserver/tmp -XX:HeapDumpPath=/usr/local/nagioslogserver/logstash/heapdump.hprof -Xbootclasspath/a:/usr/local/nagioslogserver/logstash/vendor/jruby/lib/jruby.jar -classpath : -Djruby.home=/usr/local/nagioslogserver/logstash/vendor/jruby -Djruby.lib=/usr/local/nagioslogserver/logstash/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh org.jruby.Main --1.9 /usr/local/nagioslogserver/logstash/lib/bootstrap/environment.rb logstash/runner.rb agent -f /usr/local/nagioslogserver/logstash/etc/conf.d -l /var/log/logstash/logstash.log -w 4
[root@Log_Server elasticsearch]#

scottwilkerson · Post by **scottwilkerson** » Wed Feb 27, 2019 8:08 am

Can you show the output of this command

Code: Select all

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'

Nagios Support Forum

Instance Status

Instance Status

Re: Instance Status

Re: Instance Status

Re: Instance Status

Re: Instance Status

Re: Instance Status

Re: Instance Status

Re: Instance Status

Re: Instance Status

Re: Instance Status