Comms between cluster instances appears to be broken

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
jpconsilio
Posts: 12
Joined: Mon Sep 30, 2019 11:48 am

Comms between cluster instances appears to be broken

Post by jpconsilio »

The instances in one of our two-instance clusters no longer to be in sync. On each instance, cluster status is YELLOW, and the number of active and unassigned shards is equal; # of instances = 1; # of data instances = 1
Likewise, Instance status for each instance shows data for the local instance and both logstash and elasticsearch are GREEN, but no statistics are displayed for the other instance and its' logstash and elasticsearch health indicators show as RED.
Disk space looks fine for both instances. Log searches yield no hits for a period from 2/24 up to 3/3.
Discovered trouble about sixteen hours ago. CLI confirmed services running on both nodes. Alternately bouncing each node brought clusters back into communication with shards being assigned properly (apparently). Eight hours later I'm back to where I was before.

Any guidance on troubleshooting this and restoring normal operations would be appreciated.
User avatar
jdunitz
Posts: 235
Joined: Wed Feb 05, 2020 2:50 pm

Re: Comms between cluster instances appears to be broken

Post by jdunitz »

Can you check the /usr/local/nagioslogserver/var/cluster_hosts file on both machines and make sure they look correct, i.e., both servers know about both servers? Because you're able to sync sometimes, this should be OK, but it's worth checking.

Also, would you be able to post or PM profiles from both systems?

Is it possible that you filled up your disk space at some point?

Let's start with these items, and see what we can figure out.

Thanks!

--Jeffrey
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
jpconsilio
Posts: 12
Joined: Mon Sep 30, 2019 11:48 am

Re: Comms between cluster instances appears to be broken

Post by jpconsilio »

Sure enough, the /user/local/nagioslogserver/var/cluster_hosts file no longer contained both servers. Adding their IPs back in and restarting seems to have sorted out the communications issues.

We had filled up the disk several months ago. After expanding it seemed to recover without further issue, though.
Thanks for your help!!
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Comms between cluster instances appears to be broken

Post by scottwilkerson »

jpconsilio wrote:Sure enough, the /user/local/nagioslogserver/var/cluster_hosts file no longer contained both servers. Adding their IPs back in and restarting seems to have sorted out the communications issues.

We had filled up the disk several months ago. After expanding it seemed to recover without further issue, though.
Thanks for your help!!
Great!

Locking thread
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked