Nagios user java command using over 200% CPU
Re: Nagios user java command using over 200% CPU
Good morning...
I'm having a major issue. One of the nodes in my primary cluster has crashed everyday for the last 3 days. I don't know what's going on.
This all started because we had a VMware issue and lost communication with our SAN storage while the system was running. I think that all the indexes might be corrupt on the secondary node, but I have no clue how to get the system back stable again.
I have attached an updated system profile. Can you please help me?
Thank you.
I'm having a major issue. One of the nodes in my primary cluster has crashed everyday for the last 3 days. I don't know what's going on.
This all started because we had a VMware issue and lost communication with our SAN storage while the system was running. I think that all the indexes might be corrupt on the secondary node, but I have no clue how to get the system back stable again.
I have attached an updated system profile. Can you please help me?
Thank you.
You do not have the required permissions to view the files attached to this post.
-
npolovenko
- Support Tech
- Posts: 3457
- Joined: Mon May 15, 2017 5:00 pm
Re: Nagios user java command using over 200% CPU
@rferebee, The indices don't appear to be corrupt. However, the drive is getting full and hitting the low watermark.
Please run the following command in the console and show us the output:
Please run the following command in the console and show us the output:
Thank you.curl -XGET http://localhost:9200/_cluster/settings
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Nagios user java command using over 200% CPU
Please see screenshots from both nodes in the cluster. Thank you.
You do not have the required permissions to view the files attached to this post.
Re: Nagios user java command using over 200% CPU
I just ran 'top -H' on one of my nodes and I was wondering why there are so many separate nagios java processes running? See attached screenshot.
This this typical behavior for a Nagios Log Server system?
This this typical behavior for a Nagios Log Server system?
You do not have the required permissions to view the files attached to this post.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios user java command using over 200% CPU
The -H argument to top show individual threads.
this is normal because it is a multi-thread application, you have a different thread for each connection to both elasticsearch and logstash which are both java applications.
this is normal because it is a multi-thread application, you have a different thread for each connection to both elasticsearch and logstash which are both java applications.
Re: Nagios user java command using over 200% CPU
Is there any maintenance tasks that you would recommended to ensure we keeping our Log Server as "junk free" as possible?
For example, are there any log files we can purge or errant files/directories we can remove?
We plan on expanding the drive this afternoon, but we would like to ensure we free up as much space as possible beforehand.
Thank you.
For example, are there any log files we can purge or errant files/directories we can remove?
We plan on expanding the drive this afternoon, but we would like to ensure we free up as much space as possible beforehand.
Thank you.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios user java command using over 200% CPU
The only cleanup of logs would be in the following directories but these should be taken care of by logrotate on the system already
Code: Select all
/var/log/logstash/*
/var/log/elasticsearch/*Re: Nagios user java command using over 200% CPU
Good morning, we're trying to make a decision internally and would like your assistance.
Currently, we have a singe two node cluster and we were considering adding an additional 2 node cluster and breaking off our WAN monitoring devices onto that cluster. You're aware of the performance issues we've been facing and the fact that we've been throwing resources at this thing to no avail.
Would it be better to add the two additional nodes we have to our existing cluster or have the two separate 2 node clusters like we were considering?
If we do decide to go with a 4 node cluster, how does the data get spread across the nodes? I have limited knowledge of ELK, but from what I've read it seems like it stripes the data across the nodes. The only difference being is that no more than 1 node can go offline at time?
Currently, we have a singe two node cluster and we were considering adding an additional 2 node cluster and breaking off our WAN monitoring devices onto that cluster. You're aware of the performance issues we've been facing and the fact that we've been throwing resources at this thing to no avail.
Would it be better to add the two additional nodes we have to our existing cluster or have the two separate 2 node clusters like we were considering?
If we do decide to go with a 4 node cluster, how does the data get spread across the nodes? I have limited knowledge of ELK, but from what I've read it seems like it stripes the data across the nodes. The only difference being is that no more than 1 node can go offline at time?
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios user java command using over 200% CPU
I personally would add them to the existing cluster, the cluster is more efficient the larger it is.rferebee wrote:Would it be better to add the two additional nodes we have to our existing cluster or have the two separate 2 node clusters like we were considering?
Each index is split into 5 shards, and the default behavior is have one primary and one replica shard (the replica is stored on a different server than the primary). With a larger cluster this data is spread across all 4 nodes, just keeping one replica, if one of the instances in your cluster goes down, the cluster automatically relocates the shards to make sure you still have one primary and one replica.
Re: Nagios user java command using over 200% CPU
Are you aware of or can you provide any technical documentation that describes the performance benefits we might see by expanding our existing cluster? I'm having a heck of a time finding anything online myself which says something to the effect of, "3+ nodes provide a more stable and efficient cluster for Nagios Log Server/ELK".
The issue being, I've spent a considerable amount of time trying to get this new cluster online and now we may switch directions. I'd love to have some concrete data as to why it's the best course of action.
Full disclosure, I'm all for one cluster as I'd rather manage one instead of two.
The issue being, I've spent a considerable amount of time trying to get this new cluster online and now we may switch directions. I'd love to have some concrete data as to why it's the best course of action.
Full disclosure, I'm all for one cluster as I'd rather manage one instead of two.