Nagios Log Server listening port abruptly halts
-
- Posts: 59
- Joined: Wed Feb 22, 2017 1:30 am
Nagios Log Server listening port abruptly halts
Hi all,
I've had 3 occurrences of this rather weird issue spread across two Nagios Log Server clusters in two datacentres where NLG stops listening on the designated port we use for Windows hosts, say port 3500 and then refuses to receive any log traffic on said port. The Windows boxes run the nxlog agent.
I do not see any resource issues, there are no issues with RAM or CPU utilization on the Log Server. However, I am also not well-versed in Log Server and do not know the conditions in which the cluster will pre-empt, if any, conditions where ports are no longer listening and failover to the secondary node.
Where do I start looking in the logs to try and figure out this problem?
Current NLG version:
Nagios Log Server: 1.4.4
Elasticsearch: 1.6.0
Logstash: 1.5.1
Kibana: 3.1.1-nagios3
I've had 3 occurrences of this rather weird issue spread across two Nagios Log Server clusters in two datacentres where NLG stops listening on the designated port we use for Windows hosts, say port 3500 and then refuses to receive any log traffic on said port. The Windows boxes run the nxlog agent.
I do not see any resource issues, there are no issues with RAM or CPU utilization on the Log Server. However, I am also not well-versed in Log Server and do not know the conditions in which the cluster will pre-empt, if any, conditions where ports are no longer listening and failover to the secondary node.
Where do I start looking in the logs to try and figure out this problem?
Current NLG version:
Nagios Log Server: 1.4.4
Elasticsearch: 1.6.0
Logstash: 1.5.1
Kibana: 3.1.1-nagios3
Re: Nagios Log Server listening port abruptly halts
The Logstash logs are a good place to start. Can you send them over? This command should put them all in the /tmp/43502_1.zip file:
If the file is too big to attach to a post, I'll settle for the latest logstash.log file in that same path.
Code: Select all
zip -r /tmp/43502_1.zip /var/log/logstash/
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
-
- Posts: 59
- Joined: Wed Feb 22, 2017 1:30 am
Re: Nagios Log Server listening port abruptly halts
Hi,
I've added 3 files for you
Our Log Server stopped listening for logs around 3-5pm last Saturday.
I've added 3 files for you
Our Log Server stopped listening for logs around 3-5pm last Saturday.
You do not have the required permissions to view the files attached to this post.
Re: Nagios Log Server listening port abruptly halts
This appears to be the initial problem:
Can you also share your Elasticsearch logs from the same day(s)? They're typically found in /var/log/elasticsearch.
Code: Select all
{:timestamp=>"2017-04-16T11:08:34.784000+0200", :message=>"Got error to send bulk of actions: None of the configured nodes are available: []", :level=>:error}
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
-
- Posts: 59
- Joined: Wed Feb 22, 2017 1:30 am
Re: Nagios Log Server listening port abruptly halts
Elasticsearch logs uploaded.
You do not have the required permissions to view the files attached to this post.
- tacolover101
- Posts: 432
- Joined: Mon Apr 10, 2017 11:55 am
Re: Nagios Log Server listening port abruptly halts
from what i can tell your nodes are disconnecting at some point.
and then a halt -
perhaps a network issue somewhere? i believe there are some ports that nodes need to communicate on, 9200/9300 i think. is there a firewall between them that would be blocking that communication?
Code: Select all
[2017-04-16 11:05:02,252][DEBUG][action.admin.cluster.health] [791cc6c8-f646-495e-9e58-1ec21a24b61c] no known master node, scheduling a retry
[2017-04-16 11:05:02,337][DEBUG][action.index ] [791cc6c8-f646-495e-9e58-1ec21a24b61c] observer: timeout notification from cluster service. timeout setting [1m], time since start [1m]
Code: Select all
[2017-04-16 11:05:02,252][DEBUG][action.admin.cluster.health] [791cc6c8-f646-495e-9e58-1ec21a24b61c] no known master node, scheduling a retry
[2017-04-16 11:05:02,337][DEBUG][action.index ] [791cc6c8-f646-495e-9e58-1ec21a24b61c] observer: timeout notification from cluster service. timeout setting [1m], time since start [1m]
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
-
- Posts: 59
- Joined: Wed Feb 22, 2017 1:30 am
Re: Nagios Log Server listening port abruptly halts
Hi all,
There are firewalls between the Log Server and some of the windows nodes.
I will advise once I do a traceroute to some of them.
There are firewalls between the Log Server and some of the windows nodes.
I will advise once I do a traceroute to some of them.
- tacolover101
- Posts: 432
- Joined: Mon Apr 10, 2017 11:55 am
Re: Nagios Log Server listening port abruptly halts
the logs i mentioned above pertain to the NLS clusters - you'll want to make sure port 9200/9300 can make it through. nmap might prove to be more useful than a traceroute since that's just going to measure hops.james.liew wrote:Hi all,
There are firewalls between the Log Server and some of the windows nodes.
I will advise once I do a traceroute to some of them.
the firewalls will definitely affect the setup depending where and how they're setup to filter.
-
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Nagios Log Server listening port abruptly halts
Thanks @tacolover101!
What OS are the nodes running? That will help us determine the firewall command you need to use, assuming it turns out to be a firewall issue.
What OS are the nodes running? That will help us determine the firewall command you need to use, assuming it turns out to be a firewall issue.