Trying to figure out why logstash changed to active (exited)
Re: Trying to figure out why logstash changed to active (exi
How many CPUs are on the machine? Researching the garbage collection options and upping the number of CPUs can speed this process up.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Trying to figure out why logstash changed to active (exi
So, while looking into this I found some discrepancies between the nodes, I don't know how much it matters:
LSCC2
LSCC1 and LSCC3
If the available CPU MHz matches on each node, do the socket and core differences matter?
LSCC2
Code: Select all
root@nagioslscc2:/root> lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 6
On-line CPU(s) list: 0-5
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 6
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Stepping: 0
CPU MHz: 2199.998
BogoMIPS: 4399.99
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 56320K
Code: Select all
root@nagioslscc1:/root>lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 6
On-line CPU(s) list: 0-5
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 3
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Stepping: 0
CPU MHz: 2199.998
BogoMIPS: 4399.99
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 56320K
Re: Trying to figure out why logstash changed to active (exi
It shouldn't make a difference. Though I, personally, would recommend having the VMs in the same layout. The only thing this would affect would be the number of NUMA nodes. That's more of a performance thing than anything else.
Edit:
Also, Craig mentioned a couple of things. First he was wondering if this is still happening regularly. And if so, does it also correlate to when Java is doing garbage collection?
The second thing was that originally apparently the VMs had more CPUs to work with. Would it make sense to try to add a couple more cores to the VMs to see if that helps?
Edit:
Also, Craig mentioned a couple of things. First he was wondering if this is still happening regularly. And if so, does it also correlate to when Java is doing garbage collection?
The second thing was that originally apparently the VMs had more CPUs to work with. Would it make sense to try to add a couple more cores to the VMs to see if that helps?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Trying to figure out why logstash changed to active (exi
It occurs semi-regularly. I PM'd Craig the log file he requested on November 22nd, you folks would need to tell me whether or not it correlates. I honestly have no clue.Also, Craig mentioned a couple of things. First he was wondering if this is still happening regularly. And if so, does it also correlate to when Java is doing garbage collection?
These servers have always had 36 cores each. At least since I moved over to this group over a year ago. Do you think they need more than 36 CPU cores?The second thing was that originally apparently the VMs had more CPUs to work with. Would it make sense to try to add a couple more cores to the VMs to see if that helps?
Re: Trying to figure out why logstash changed to active (exi
I can't imagine that they would need more than 36 cores. But right now they definitely do not have 36 cores.
LSCC2
LSCC1 and LSCC3
Looks like each VM has 6 cores to play with. The important piece here is that the more cores a VM has, the more threads Java will spin up. If there is a correlation between garbage collection and logstash crashing, then having more cores could help speed up garbage collection, which could shorten the window in which logstash crashes.
LSCC2
Code: Select all
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 6Code: Select all
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 3As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Trying to figure out why logstash changed to active (exi
Oh ok, I was interpreting that output completely differently.
Let me speak with my bosses and figure out if upping the core count is an option for us at this time.
Thank you.
Let me speak with my bosses and figure out if upping the core count is an option for us at this time.
Thank you.
Re: Trying to figure out why logstash changed to active (exi
Okay, excellent. We will keep this open and wait to hear back.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: Trying to figure out why logstash changed to active (exi
Good morning, we experienced a crash this morning, but it was totally user related and not an issue with Log Server.
My question though, when someone is in the console running a query and they attempt to queue up more than 7 days worth of logs, we experience extreme system slowness.
Is there a way to remove the 30 day search option? Or, even better, how can we provide more resources to the environment so if someone runs a 14 day query it doesn't bog down as much? Do those get queued up in memory or is it taxing the CPU when users run a large query like that?
Thank you!
My question though, when someone is in the console running a query and they attempt to queue up more than 7 days worth of logs, we experience extreme system slowness.
Is there a way to remove the 30 day search option? Or, even better, how can we provide more resources to the environment so if someone runs a 14 day query it doesn't bog down as much? Do those get queued up in memory or is it taxing the CPU when users run a large query like that?
Thank you!
Re: Trying to figure out why logstash changed to active (exi
Also, I'm still having the issue (on just one of my nodes) where it won't let me restart the elasticsearch service. I made the proposed changes to the memory config for elastisearch and logstash, but when I attempted to restart the elasticsearch service it failed to stop it and I had to manually run: systemctl stop elasticsearch to ensure it was stopped.
I don't know if that's a memory issue or what, but since each node is identical I doubt that.
I don't know if that's a memory issue or what, but since each node is identical I doubt that.
Re: Trying to figure out why logstash changed to active (exi
If you go to Admin -> Snapshots & Maintenance -> Maintenance and Repository Settings, what do you have set for your Maintenance Settings? I'm wondering if you have indexes closing after 7 days. I think that's default. But let's take a look at all of the options.
Other than that, searching is going to rely on 2 things:
CPU power
and how fast you can get the data to the CPU
You've got 6 CPU cores to work with, so let's make sure we're making the most of them. As root, run,
Let's see what that outputs. There shouldn't be any real restrictions on what root can do, but let's just check to be sure. Assuming root can spawn several thousand processes, we should be good there.
The other thing is, if I recall correctly, your log data is actually on network attached storage. Is that on something like a 10 gigabit connection or better? Or was that just for backups and snapshots?
Other than that, searching is going to rely on 2 things:
CPU power
and how fast you can get the data to the CPU
You've got 6 CPU cores to work with, so let's make sure we're making the most of them. As root, run,
Code: Select all
ulimit -aThe other thing is, if I recall correctly, your log data is actually on network attached storage. Is that on something like a 10 gigabit connection or better? Or was that just for backups and snapshots?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!