Nagios Log Server listening port abruptly halts v2
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: Nagios Log Server listening port abruptly halts v2
It would mostly likely be a better approach to solve the problem rather than mask it. Just working around could introduce worse issues as well.
The only way to do that effectively is as @mcapra already suggested with ALL the logs from the contemporary time slot.
You can try things like perhaps turning off certain inputs to attempt to isolate the problem, but that is more a shot in the dark approach. Logs are much more definitive.
The only way to do that effectively is as @mcapra already suggested with ALL the logs from the contemporary time slot.
You can try things like perhaps turning off certain inputs to attempt to isolate the problem, but that is more a shot in the dark approach. Logs are much more definitive.
Previous Nagios employee
-
james.liew
- Posts: 59
- Joined: Wed Feb 22, 2017 1:30 am
Re: Nagios Log Server listening port abruptly halts v2
I'll grab them the next time we have an issue, likely Friday evening/Sat morning.
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: Nagios Log Server listening port abruptly halts v2
Sounds good, our support hours are listed here:
https://www.nagios.com/contact/
However you can post or PM your logs at any point.
https://www.nagios.com/contact/
However you can post or PM your logs at any point.
Previous Nagios employee
-
james.liew
- Posts: 59
- Joined: Wed Feb 22, 2017 1:30 am
Re: Nagios Log Server listening port abruptly halts v2
Cron job seems to have done it's job, no alerts for Saturday or Friday or Sunday even. Will monitor further for now.
EDIT: I will remove the cron job sometime this week. We have a support contract setup and I'm waiting for my account access to the Customer section of the forum.
This thread might need to be moved later in the week!
EDIT: I will remove the cron job sometime this week. We have a support contract setup and I'm waiting for my account access to the Customer section of the forum.
This thread might need to be moved later in the week!
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: Nagios Log Server listening port abruptly halts v2
Sure we'll keep it open for now.
Previous Nagios employee
-
james.liew
- Posts: 59
- Joined: Wed Feb 22, 2017 1:30 am
Re: Nagios Log Server listening port abruptly halts v2
Looks like the service just died right just now.
Can this thread be moved to the customer support section? I've just received access to it.
I will upload all the necessary log files once I WINSCP and then restart the services.
Can this thread be moved to the customer support section? I've just received access to it.
I will upload all the necessary log files once I WINSCP and then restart the services.
-
james.liew
- Posts: 59
- Joined: Wed Feb 22, 2017 1:30 am
Re: Nagios Log Server listening port abruptly halts v2
HI all,
New logs attached, as previously requested, I've uploaded ALL the logs from both Elasticsearch and Logstash
New logs attached, as previously requested, I've uploaded ALL the logs from both Elasticsearch and Logstash
You do not have the required permissions to view the files attached to this post.
Re: Nagios Log Server listening port abruptly halts v2
The start of the chatter in the latest Logstash log is here:
Which I think means Logstash was unable to handle all the socket connections it was trying to maintain. This could be, as mentioned previously, a side-effect of the tcp input plugin not responsibly terminating connections.
However, I noticed this activity occurring around the same time in the Elasticsearch logs:
Which leads me to believe that, rather than Logstash misbehaving internally, the available memory on this machine is being exhausted. Do you have performance data available for this machine around those times? I apologize if memory as a root cause has already been examined, but the correlation is a strong one in this case I believe.
Looking back to the May 30th occurrence, I noticed this:
Unfortunately, the earliest back our Elasticsearch logs go is 17:44:00 of that day and everything looks like it was ok by then:
Though if we had Elasticsearch logs that we could match up to our Logstash logs, my hunch is that we would see similar memory related exceptions.
Code: Select all
## timestamp=>"2017-06-06T09:31:32.317000+0200"
"org/jruby/RubyIO.java:2996:in `sysread'",
"/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.5/lib/logstash/inputs/tcp.rb:164:in `read'",
"/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.5/lib/logstash/inputs/tcp.rb:112:in `handle_socket'",
"/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.5/lib/logstash/inputs/tcp.rb:147:in `client_thread'",
"/usr/local/nagioslogserver/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-tcp-0.1.5/lib/logstash/inputs/tcp.rb:145:in `client_thread'"
However, I noticed this activity occurring around the same time in the Elasticsearch logs:
Code: Select all
[2017-06-06 09:31:31,223][WARN ][index.engine ] [791cc6c8-f646-495e-9e58-1ec21a24b61c] [logstash-2017.06.06][2] failed engine [out of memory]
java.lang.OutOfMemoryError: unable to create new native threadLooking back to the May 30th occurrence, I noticed this:
Code: Select all
{
: timestamp => "2017-05-30T02:16:58.454000+0200",
: message => "Failed to install template: None of the configured nodes are available: []",
: level => : error
}
Code: Select all
[2017-05-30 17:44:01,588][INFO ][KnapsackExportAction ] start of export: {"mode":"export","started":"2017-05-30T15:44:01.586Z","path":"file:///store/backups/nagioslogserver/1496159041/nagioslogserver.tar.gz","node_name":"791cc6c8-f646-495e-9e58-1ec21a24b61c"}
Former Nagios employee
https://www.mcapra.com/
https://www.mcapra.com/
-
avandemore
- Posts: 1597
- Joined: Tue Sep 27, 2016 4:57 pm
Re: Nagios Log Server listening port abruptly halts v2
I would concur with @mcapra's assessment at this point, as ES issues tend to bubble up to LS. @james.liew, can you confirm this is the issue/resolution?
Previous Nagios employee
-
james.liew
- Posts: 59
- Joined: Wed Feb 22, 2017 1:30 am
Re: Nagios Log Server listening port abruptly halts v2
Based on previous feedback, I've already allocated an additiona 8GB of RAM, I'm now at 16GB on LOG-01.
Each "dip" indicates when the LS/ES service was restarted.
Each "dip" indicates when the LS/ES service was restarted.
You do not have the required permissions to view the files attached to this post.