Trying to figure out why logstash changed to active (exited)

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Trying to figure out why logstash changed to active (exi

Post by cdienger »

How many CPUs are on the machine? Researching the garbage collection options and upping the number of CPUs can speed this process up.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Trying to figure out why logstash changed to active (exi

Post by rferebee »

So, while looking into this I found some discrepancies between the nodes, I don't know how much it matters:

LSCC2

Code: Select all

root@nagioslscc2:/root> lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                6
On-line CPU(s) list:   0-5
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             6
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Stepping:              0
CPU MHz:               2199.998
BogoMIPS:              4399.99
Hypervisor vendor:     VMware
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              56320K
LSCC1 and LSCC3

Code: Select all

root@nagioslscc1:/root>lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                6
On-line CPU(s) list:   0-5
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             3
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Stepping:              0
CPU MHz:               2199.998
BogoMIPS:              4399.99
Hypervisor vendor:     VMware
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              56320K
If the available CPU MHz matches on each node, do the socket and core differences matter?
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Trying to figure out why logstash changed to active (exi

Post by mbellerue »

It shouldn't make a difference. Though I, personally, would recommend having the VMs in the same layout. The only thing this would affect would be the number of NUMA nodes. That's more of a performance thing than anything else.

Edit:
Also, Craig mentioned a couple of things. First he was wondering if this is still happening regularly. And if so, does it also correlate to when Java is doing garbage collection?

The second thing was that originally apparently the VMs had more CPUs to work with. Would it make sense to try to add a couple more cores to the VMs to see if that helps?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Trying to figure out why logstash changed to active (exi

Post by rferebee »

Also, Craig mentioned a couple of things. First he was wondering if this is still happening regularly. And if so, does it also correlate to when Java is doing garbage collection?
It occurs semi-regularly. I PM'd Craig the log file he requested on November 22nd, you folks would need to tell me whether or not it correlates. I honestly have no clue.
The second thing was that originally apparently the VMs had more CPUs to work with. Would it make sense to try to add a couple more cores to the VMs to see if that helps?
These servers have always had 36 cores each. At least since I moved over to this group over a year ago. Do you think they need more than 36 CPU cores?
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Trying to figure out why logstash changed to active (exi

Post by mbellerue »

I can't imagine that they would need more than 36 cores. But right now they definitely do not have 36 cores.

LSCC2

Code: Select all

Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             6
LSCC1 and LSCC3

Code: Select all

Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             3
Looks like each VM has 6 cores to play with. The important piece here is that the more cores a VM has, the more threads Java will spin up. If there is a correlation between garbage collection and logstash crashing, then having more cores could help speed up garbage collection, which could shorten the window in which logstash crashes.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Trying to figure out why logstash changed to active (exi

Post by rferebee »

Oh ok, I was interpreting that output completely differently.

Let me speak with my bosses and figure out if upping the core count is an option for us at this time.

Thank you.
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Trying to figure out why logstash changed to active (exi

Post by mbellerue »

Okay, excellent. We will keep this open and wait to hear back.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Trying to figure out why logstash changed to active (exi

Post by rferebee »

Good morning, we experienced a crash this morning, but it was totally user related and not an issue with Log Server.

My question though, when someone is in the console running a query and they attempt to queue up more than 7 days worth of logs, we experience extreme system slowness.

Is there a way to remove the 30 day search option? Or, even better, how can we provide more resources to the environment so if someone runs a 14 day query it doesn't bog down as much? Do those get queued up in memory or is it taxing the CPU when users run a large query like that?

Thank you!
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Trying to figure out why logstash changed to active (exi

Post by rferebee »

Also, I'm still having the issue (on just one of my nodes) where it won't let me restart the elasticsearch service. I made the proposed changes to the memory config for elastisearch and logstash, but when I attempted to restart the elasticsearch service it failed to stop it and I had to manually run: systemctl stop elasticsearch to ensure it was stopped.

I don't know if that's a memory issue or what, but since each node is identical I doubt that.
User avatar
mbellerue
Posts: 1403
Joined: Fri Jul 12, 2019 11:10 am

Re: Trying to figure out why logstash changed to active (exi

Post by mbellerue »

If you go to Admin -> Snapshots & Maintenance -> Maintenance and Repository Settings, what do you have set for your Maintenance Settings? I'm wondering if you have indexes closing after 7 days. I think that's default. But let's take a look at all of the options.

Other than that, searching is going to rely on 2 things:
CPU power
and how fast you can get the data to the CPU

You've got 6 CPU cores to work with, so let's make sure we're making the most of them. As root, run,

Code: Select all

ulimit -a
Let's see what that outputs. There shouldn't be any real restrictions on what root can do, but let's just check to be sure. Assuming root can spawn several thousand processes, we should be good there.

The other thing is, if I recall correctly, your log data is actually on network attached storage. Is that on something like a 10 gigabit connection or better? Or was that just for backups and snapshots?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked