Nagios Log Server listening port abruptly halts v2

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: Nagios Log Server listening port abruptly halts v2

Post by avandemore »

It would mostly likely be a better approach to solve the problem rather than mask it. Just working around could introduce worse issues as well.

The only way to do that effectively is as @mcapra already suggested with ALL the logs from the contemporary time slot.

You can try things like perhaps turning off certain inputs to attempt to isolate the problem, but that is more a shot in the dark approach. Logs are much more definitive.
Previous Nagios employee
james.liew
Posts: 59
Joined: Wed Feb 22, 2017 1:30 am

Re: Nagios Log Server listening port abruptly halts v2

Post by james.liew »

I'll grab them the next time we have an issue, likely Friday evening/Sat morning.
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: Nagios Log Server listening port abruptly halts v2

Post by avandemore »

Sounds good, our support hours are listed here:

https://www.nagios.com/contact/

However you can post or PM your logs at any point.
Previous Nagios employee
james.liew
Posts: 59
Joined: Wed Feb 22, 2017 1:30 am

Re: Nagios Log Server listening port abruptly halts v2

Post by james.liew »

Cron job seems to have done it's job, no alerts for Saturday or Friday or Sunday even. Will monitor further for now.

EDIT: I will remove the cron job sometime this week. We have a support contract setup and I'm waiting for my account access to the Customer section of the forum.

This thread might need to be moved later in the week!
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: Nagios Log Server listening port abruptly halts v2

Post by avandemore »

Sure we'll keep it open for now.
Previous Nagios employee
james.liew
Posts: 59
Joined: Wed Feb 22, 2017 1:30 am

Re: Nagios Log Server listening port abruptly halts v2

Post by james.liew »

Looks like the service just died right just now.

Can this thread be moved to the customer support section? I've just received access to it.

I will upload all the necessary log files once I WINSCP and then restart the services.
james.liew
Posts: 59
Joined: Wed Feb 22, 2017 1:30 am

Re: Nagios Log Server listening port abruptly halts v2

Post by james.liew »

HI all,

New logs attached, as previously requested, I've uploaded ALL the logs from both Elasticsearch and Logstash
You do not have the required permissions to view the files attached to this post.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Nagios Log Server listening port abruptly halts v2

Post by mcapra »

The start of the chatter in the latest Logstash log is here:

Code: Select all

## timestamp=>"2017-06-06T09:31:32.317000+0200"
"org/jruby/RubyIO.java:2996:in `sysread'", 
"/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.5/lib/logstash/inputs/tcp.rb:164:in `read'", 
"/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.5/lib/logstash/inputs/tcp.rb:112:in `handle_socket'", 
"/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.5/lib/logstash/inputs/tcp.rb:147:in `client_thread'", 
"/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.5/lib/logstash/inputs/tcp.rb:145:in `client_thread'"
Which I think means Logstash was unable to handle all the socket connections it was trying to maintain. This could be, as mentioned previously, a side-effect of the tcp input plugin not responsibly terminating connections.

However, I noticed this activity occurring around the same time in the Elasticsearch logs:

Code: Select all

[2017-06-06 09:31:31,223][WARN ][index.engine             ] [791cc6c8-f646-495e-9e58-1ec21a24b61c] [logstash-2017.06.06][2] failed engine [out of memory]
java.lang.OutOfMemoryError: unable to create new native thread
Which leads me to believe that, rather than Logstash misbehaving internally, the available memory on this machine is being exhausted. Do you have performance data available for this machine around those times? I apologize if memory as a root cause has already been examined, but the correlation is a strong one in this case I believe.

Looking back to the May 30th occurrence, I noticed this:

Code: Select all

{
	: timestamp => "2017-05-30T02:16:58.454000+0200",
	: message => "Failed to install template: None of the configured nodes are available: []",
	: level => : error
}
Unfortunately, the earliest back our Elasticsearch logs go is 17:44:00 of that day and everything looks like it was ok by then:

Code: Select all

[2017-05-30 17:44:01,588][INFO ][KnapsackExportAction     ] start of export: {"mode":"export","started":"2017-05-30T15:44:01.586Z","path":"file:///store/backups/nagioslogserver/1496159041/nagioslogserver.tar.gz","node_name":"791cc6c8-f646-495e-9e58-1ec21a24b61c"}
Though if we had Elasticsearch logs that we could match up to our Logstash logs, my hunch is that we would see similar memory related exceptions.
Former Nagios employee
https://www.mcapra.com/
avandemore
Posts: 1597
Joined: Tue Sep 27, 2016 4:57 pm

Re: Nagios Log Server listening port abruptly halts v2

Post by avandemore »

I would concur with @mcapra's assessment at this point, as ES issues tend to bubble up to LS. @james.liew, can you confirm this is the issue/resolution?
Previous Nagios employee
james.liew
Posts: 59
Joined: Wed Feb 22, 2017 1:30 am

Re: Nagios Log Server listening port abruptly halts v2

Post by james.liew »

Based on previous feedback, I've already allocated an additiona 8GB of RAM, I'm now at 16GB on LOG-01.

Each "dip" indicates when the LS/ES service was restarted.
2017-06-08_16-30-11.png
You do not have the required permissions to view the files attached to this post.
Locked