Page 3 of 5

Re: Nagios Log Server listening port abruptly halts v2

Posted: Wed May 31, 2017 3:32 pm
by avandemore
It would mostly likely be a better approach to solve the problem rather than mask it. Just working around could introduce worse issues as well.

The only way to do that effectively is as @mcapra already suggested with ALL the logs from the contemporary time slot.

You can try things like perhaps turning off certain inputs to attempt to isolate the problem, but that is more a shot in the dark approach. Logs are much more definitive.

Re: Nagios Log Server listening port abruptly halts v2

Posted: Wed May 31, 2017 7:45 pm
by james.liew
I'll grab them the next time we have an issue, likely Friday evening/Sat morning.

Re: Nagios Log Server listening port abruptly halts v2

Posted: Thu Jun 01, 2017 10:06 am
by avandemore
Sounds good, our support hours are listed here:

https://www.nagios.com/contact/

However you can post or PM your logs at any point.

Re: Nagios Log Server listening port abruptly halts v2

Posted: Sun Jun 04, 2017 10:35 pm
by james.liew
Cron job seems to have done it's job, no alerts for Saturday or Friday or Sunday even. Will monitor further for now.

EDIT: I will remove the cron job sometime this week. We have a support contract setup and I'm waiting for my account access to the Customer section of the forum.

This thread might need to be moved later in the week!

Re: Nagios Log Server listening port abruptly halts v2

Posted: Mon Jun 05, 2017 12:00 pm
by avandemore
Sure we'll keep it open for now.

Re: Nagios Log Server listening port abruptly halts v2

Posted: Tue Jun 06, 2017 2:43 am
by james.liew
Looks like the service just died right just now.

Can this thread be moved to the customer support section? I've just received access to it.

I will upload all the necessary log files once I WINSCP and then restart the services.

Re: Nagios Log Server listening port abruptly halts v2

Posted: Tue Jun 06, 2017 2:57 am
by james.liew
HI all,

New logs attached, as previously requested, I've uploaded ALL the logs from both Elasticsearch and Logstash

Re: Nagios Log Server listening port abruptly halts v2

Posted: Tue Jun 06, 2017 8:10 am
by mcapra
The start of the chatter in the latest Logstash log is here:

Code: Select all

## timestamp=>"2017-06-06T09:31:32.317000+0200"
"org/jruby/RubyIO.java:2996:in `sysread'", 
"/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.5/lib/logstash/inputs/tcp.rb:164:in `read'", 
"/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.5/lib/logstash/inputs/tcp.rb:112:in `handle_socket'", 
"/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.5/lib/logstash/inputs/tcp.rb:147:in `client_thread'", 
"/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.5/lib/logstash/inputs/tcp.rb:145:in `client_thread'"
Which I think means Logstash was unable to handle all the socket connections it was trying to maintain. This could be, as mentioned previously, a side-effect of the tcp input plugin not responsibly terminating connections.

However, I noticed this activity occurring around the same time in the Elasticsearch logs:

Code: Select all

[2017-06-06 09:31:31,223][WARN ][index.engine             ] [791cc6c8-f646-495e-9e58-1ec21a24b61c] [logstash-2017.06.06][2] failed engine [out of memory]
java.lang.OutOfMemoryError: unable to create new native thread
Which leads me to believe that, rather than Logstash misbehaving internally, the available memory on this machine is being exhausted. Do you have performance data available for this machine around those times? I apologize if memory as a root cause has already been examined, but the correlation is a strong one in this case I believe.

Looking back to the May 30th occurrence, I noticed this:

Code: Select all

{
	: timestamp => "2017-05-30T02:16:58.454000+0200",
	: message => "Failed to install template: None of the configured nodes are available: []",
	: level => : error
}
Unfortunately, the earliest back our Elasticsearch logs go is 17:44:00 of that day and everything looks like it was ok by then:

Code: Select all

[2017-05-30 17:44:01,588][INFO ][KnapsackExportAction     ] start of export: {"mode":"export","started":"2017-05-30T15:44:01.586Z","path":"file:///store/backups/nagioslogserver/1496159041/nagioslogserver.tar.gz","node_name":"791cc6c8-f646-495e-9e58-1ec21a24b61c"}
Though if we had Elasticsearch logs that we could match up to our Logstash logs, my hunch is that we would see similar memory related exceptions.

Re: Nagios Log Server listening port abruptly halts v2

Posted: Tue Jun 06, 2017 1:58 pm
by avandemore
I would concur with @mcapra's assessment at this point, as ES issues tend to bubble up to LS. @james.liew, can you confirm this is the issue/resolution?

Re: Nagios Log Server listening port abruptly halts v2

Posted: Thu Jun 08, 2017 3:31 am
by james.liew
Based on previous feedback, I've already allocated an additiona 8GB of RAM, I'm now at 16GB on LOG-01.

Each "dip" indicates when the LS/ES service was restarted.
2017-06-08_16-30-11.png