Nagios Log Server listening port abruptly halts

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
james.liew
Posts: 59
Joined: Wed Feb 22, 2017 1:30 am

Nagios Log Server listening port abruptly halts

Post by james.liew »

Hi all,

I've had 3 occurrences of this rather weird issue spread across two Nagios Log Server clusters in two datacentres where NLG stops listening on the designated port we use for Windows hosts, say port 3500 and then refuses to receive any log traffic on said port. The Windows boxes run the nxlog agent.

I do not see any resource issues, there are no issues with RAM or CPU utilization on the Log Server. However, I am also not well-versed in Log Server and do not know the conditions in which the cluster will pre-empt, if any, conditions where ports are no longer listening and failover to the secondary node.

Where do I start looking in the logs to try and figure out this problem?

Current NLG version:
Nagios Log Server: 1.4.4
Elasticsearch: 1.6.0
Logstash: 1.5.1
Kibana: 3.1.1-nagios3
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Nagios Log Server listening port abruptly halts

Post by mcapra »

The Logstash logs are a good place to start. Can you send them over? This command should put them all in the /tmp/43502_1.zip file:

Code: Select all

zip -r /tmp/43502_1.zip /var/log/logstash/
If the file is too big to attach to a post, I'll settle for the latest logstash.log file in that same path.
Former Nagios employee
https://www.mcapra.com/
james.liew
Posts: 59
Joined: Wed Feb 22, 2017 1:30 am

Re: Nagios Log Server listening port abruptly halts

Post by james.liew »

Hi,

I've added 3 files for you :)

Our Log Server stopped listening for logs around 3-5pm last Saturday.
You do not have the required permissions to view the files attached to this post.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Nagios Log Server listening port abruptly halts

Post by mcapra »

This appears to be the initial problem:

Code: Select all

{:timestamp=>"2017-04-16T11:08:34.784000+0200", :message=>"Got error to send bulk of actions: None of the configured nodes are available: []", :level=>:error}
Can you also share your Elasticsearch logs from the same day(s)? They're typically found in /var/log/elasticsearch.
Former Nagios employee
https://www.mcapra.com/
james.liew
Posts: 59
Joined: Wed Feb 22, 2017 1:30 am

Re: Nagios Log Server listening port abruptly halts

Post by james.liew »

Elasticsearch logs uploaded.
You do not have the required permissions to view the files attached to this post.
User avatar
tacolover101
Posts: 432
Joined: Mon Apr 10, 2017 11:55 am

Re: Nagios Log Server listening port abruptly halts

Post by tacolover101 »

from what i can tell your nodes are disconnecting at some point.

Code: Select all

[2017-04-16 11:05:02,252][DEBUG][action.admin.cluster.health] [791cc6c8-f646-495e-9e58-1ec21a24b61c] no known master node, scheduling a retry
[2017-04-16 11:05:02,337][DEBUG][action.index             ] [791cc6c8-f646-495e-9e58-1ec21a24b61c] observer: timeout notification from cluster service. timeout setting [1m], time since start [1m]
and then a halt -

Code: Select all

[2017-04-16 11:05:02,252][DEBUG][action.admin.cluster.health] [791cc6c8-f646-495e-9e58-1ec21a24b61c] no known master node, scheduling a retry
[2017-04-16 11:05:02,337][DEBUG][action.index             ] [791cc6c8-f646-495e-9e58-1ec21a24b61c] observer: timeout notification from cluster service. timeout setting [1m], time since start [1m]
perhaps a network issue somewhere? i believe there are some ports that nodes need to communicate on, 9200/9300 i think. is there a firewall between them that would be blocking that communication?
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios Log Server listening port abruptly halts

Post by dwhitfield »

Thanks @tacolover101.

OP, can you run a traceroute to the different nodes?
james.liew
Posts: 59
Joined: Wed Feb 22, 2017 1:30 am

Re: Nagios Log Server listening port abruptly halts

Post by james.liew »

Hi all,

There are firewalls between the Log Server and some of the windows nodes.

I will advise once I do a traceroute to some of them.
User avatar
tacolover101
Posts: 432
Joined: Mon Apr 10, 2017 11:55 am

Re: Nagios Log Server listening port abruptly halts

Post by tacolover101 »

james.liew wrote:Hi all,

There are firewalls between the Log Server and some of the windows nodes.

I will advise once I do a traceroute to some of them.
the logs i mentioned above pertain to the NLS clusters - you'll want to make sure port 9200/9300 can make it through. nmap might prove to be more useful than a traceroute since that's just going to measure hops.

the firewalls will definitely affect the setup depending where and how they're setup to filter.
dwhitfield
Former Nagios Staff
Posts: 4583
Joined: Wed Sep 21, 2016 10:29 am
Location: NoLo, Minneapolis, MN
Contact:

Re: Nagios Log Server listening port abruptly halts

Post by dwhitfield »

Thanks @tacolover101!

What OS are the nodes running? That will help us determine the firewall command you need to use, assuming it turns out to be a firewall issue.
Locked