I am starting to have an issue with the nagios log servers running in our environment. I have 2 servers with 8 CPUs and 20GB ram each. I have a repository built to put archived data to and I am only retaining 14 days of data. My shard / indexes are roughly (128Gb to 256gb) / day. I am running 2 servers clustered and have roughly 3TB of data on both. Every day I sign in and have to restart 3 services; elasticsearch, logstash, and httpd, before I can get into the in to the interface. And often times the interface will crash just running simple queries. If I run a complex query that spans more than 1 day the system hangs and crashes.
I work in a secured industry, and this data needs to be available and contiguous, with no gaps. I receive 4M to 7.5M logs every 15 minutes according to the interface dashboard. Do I need more resources? Should I be adding another server to the cluster? I am having all the logs send to one server, and I am working with the network engineers to change this over to the second server in the cluster, that way only the data from the servers will go in through the 1st log server and the network will go in through the second log server.
Anyone else having large data input problems that are crashing services? what if any were your resolutions.
Thanks in advance,
Joe Hahn
Logstash / elasticsearch services crashing
Re: Logstash / elasticsearch services crashing
Hi @jhahn and welcome!
It sounds like you're probably running into a memory issue. The Elasticsearch backend will only take half of the total system memory so even though each system has 20, the database responsible for holding all the data that is being queried is going to be limited to 10. It helps that there are two to effectively double that but it can still be a lot given the amount of data coming in. I would start by upping the memory on the systems(64 if possible but no more - more than that will have a negative impact on performance) and increasing the amount of memory allocated to the logstash process(responsible for intake of and parsing logs) and php process per these:
https://support.nagios.com/kb/article.php?id=132
https://support.nagios.com/kb/article/n ... g-576.html
It sounds like you're probably running into a memory issue. The Elasticsearch backend will only take half of the total system memory so even though each system has 20, the database responsible for holding all the data that is being queried is going to be limited to 10. It helps that there are two to effectively double that but it can still be a lot given the amount of data coming in. I would start by upping the memory on the systems(64 if possible but no more - more than that will have a negative impact on performance) and increasing the amount of memory allocated to the logstash process(responsible for intake of and parsing logs) and php process per these:
https://support.nagios.com/kb/article.php?id=132
https://support.nagios.com/kb/article/n ... g-576.html
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: Logstash / elasticsearch services crashing
OK I have brought both servers in the cluster to 32 GB of ram. I am not sure my System Engineer will let me go much higher than that. I will run with that and see what my performance looks like. Thanks.
Re: Logstash / elasticsearch services crashing
Keep us posted.
I would also point out that you can keep an eye on the memory heap usage from the command line with:
and also via XI if you have it:
https://support.nagios.com/kb/article/n ... i-857.html
I would also point out that you can keep an eye on the memory heap usage from the command line with:
Code: Select all
curl 'localhost:9200/_cat/nodes?v'https://support.nagios.com/kb/article/n ... i-857.html
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.