Nagios Support Forum

Posted: **Wed Mar 04, 2020 8:29 am**

The instances in one of our two-instance clusters no longer to be in sync. On each instance, cluster status is YELLOW, and the number of active and unassigned shards is equal; # of instances = 1; # of data instances = 1
Likewise, Instance status for each instance shows data for the local instance and both logstash and elasticsearch are GREEN, but no statistics are displayed for the other instance and its' logstash and elasticsearch health indicators show as RED.
Disk space looks fine for both instances. Log searches yield no hits for a period from 2/24 up to 3/3.
Discovered trouble about sixteen hours ago. CLI confirmed services running on both nodes. Alternately bouncing each node brought clusters back into communication with shards being assigned properly (apparently). Eight hours later I'm back to where I was before.

Any guidance on troubleshooting this and restoring normal operations would be appreciated.

Posted: **Wed Mar 04, 2020 11:59 am**

Can you check the /usr/local/nagioslogserver/var/cluster_hosts file on both machines and make sure they look correct, i.e., both servers know about both servers? Because you're able to sync sometimes, this should be OK, but it's worth checking.

Also, would you be able to post or PM profiles from both systems?

Is it possible that you filled up your disk space at some point?

Let's start with these items, and see what we can figure out.

Thanks!

--Jeffrey

Posted: **Tue Mar 10, 2020 5:19 pm**

Sure enough, the /user/local/nagioslogserver/var/cluster_hosts file no longer contained both servers. Adding their IPs back in and restarting seems to have sorted out the communications issues.

We had filled up the disk several months ago. After expanding it seemed to recover without further issue, though.
Thanks for your help!!

Posted: **Wed Mar 11, 2020 8:15 am**

jpconsilio wrote:Sure enough, the /user/local/nagioslogserver/var/cluster_hosts file no longer contained both servers. Adding their IPs back in and restarting seems to have sorted out the communications issues.

We had filled up the disk several months ago. After expanding it seemed to recover without further issue, though.
Thanks for your help!!

Great!

Locking thread

Nagios Support Forum

Comms between cluster instances appears to be broken

Comms between cluster instances appears to be broken

Re: Comms between cluster instances appears to be broken

Re: Comms between cluster instances appears to be broken

Re: Comms between cluster instances appears to be broken