Cluster Stability issues after upgrading to 1.4.4

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
Locked
Jklre
Posts: 163
Joined: Wed May 28, 2014 1:56 pm

Cluster Stability issues after upgrading to 1.4.4

Post by Jklre »

I have started to see some incidents with our main nagios log server cluster becoming non responsive after upgrading to 1.4.4. The elastic search service simply stops responding and goes OOM when attempting to restart it. CPU and memory utilization remains stable during this process. I attached the java log from the service crashing.

I also had a bizarre indecent with the logstash service spitting out this cryptic unreadable log file until I restarted the service.

Here's a sample, its 25 megs of this:
222c 2022 4c6f 6753 7461 7368 3a3a 4f75
7470 7574 733a 3a45 6c61 7374 6963 5365
6172 6368 2e66 6c75 7368 282f 7573 722f
6c6f 6361 6c2f 6e61 6769 6f73 6c6f 6773
6572 7665 722f 6c6f 6773 7461 7368 2f76
656e 646f 722f 6275 6e64 6c65 2f6a 7275
6279 2f31 2e39 2f67 656d 732f 6c6f 6773
7461 7368 2d6f 7574 7075 742d 656c 6173
7469 6373 6561 7263 682d 302e 322e 382d
6a61 7661 2f6c 6962 2f6c 6f67 7374 6173
682f 6f75 7470 7574 732f 656c 6173 7469
6373 6561 7263 682e 7262 3a34 3839 2922
2c20 2253 7475 643a 3a42 7566 6665 722e
6275 6666 6572 5f66 6c75 7368 282f 7573
722f 6c6f 6361 6c2f 6e61 6769 6f73 6c6f
6773 6572 7665 722f 6c6f 6773 7461 7368
2f76 656e 646f 722f 6275 6e64 6c65 2f6a
7275 6279 2f31 2e39 2f67 656d 732f 7374
7564 2d30 2e30 2e31 392f 6c69 622f 7374
7564 2f62 7566 6665 722e 7262 3a32 3139
2922 2c20 2253 7475 643a 3a42 7566 6665
722e 6275 6666 6572 5f66 6c75 7368 282f
7573 722f 6c6f 6361 6c2f 6e61 6769 6f73
6c6f 6773 6572 7665 722f 6c6f 6773 7461
7368 2f76 656e 646f 722f 6275 6e64 6c65
2f6a 7275 6279 2f31 2e39 2f67 656d 732f
7374 7564 2d30 2e30 2e31 392f 6c69 622f
7374 7564 2f62 7566 6665 722e 7262 3a32
3139 2922 2c20 226f 7267 2e6a 7275 6279
2e52 7562 7948 6173 682e 6561 6368 286f
7267 2f6a 7275 6279 2f52 7562 7948 6173
682e 6a61 7661 3a31 3334 3129 222c 2022
5374 7564 3a3a 4275 6666 6572 2e62 7566
6665 725f 666c 7573 6828 2f75 7372 2f6c
6f63 616c 2f6e 6167 696f 736c 6f67 7365
7276 6572 2f6c 6f67 7374 6173 682f 7665
6e64 6f72 2f62 756e 646c 652f 6a72 7562
792f 312e 392f 6765 6d73 2f73 7475 642d
302e 302e 3139 2f6c 6962 2f73 7475 642f
6275 6666 6572 2e72 623a 3231 3629 222c
2022 5374 7564 3a3a 4275 6666 6572 2e62
7566 6665 725f 666c 7573 6828 2f75 7372
2f6c 6f63 616c 2f6e 6167 696f 736c 6f67
7365 7276 6572 2f6c 6f67 7374 6173 682f
7665 6e64 6f72 2f62 756e 646c 652f 6a72
7562 792f 312e 392f 6765 6d73 2f73 7475
642d 302e 302e 3139 2f6c 6962 2f73 7475
642f 6275 6666 6572 2e72 623a 3231 3629
222c 2022 5374 7564 3a3a 4275 6666 6572
2e62 7566 6665 725f 666c 7573 6828 2f75
7372 2f6c 6f63 616c 2f6e 6167 696f 736c
6f67 7365 7276 6572 2f6c 6f67 7374 6173
682f 7665 6e64 6f72 2f62 756e 646c 652f
6a72 7562 792f 312e 392f 6765 6d73 2f73
7475 642d 302e 302e 3139 2f6c 6962 2f73
7475 642f 6275 6666 6572 2e72 623a 3139
3329 222c 2022 5374 7564 3a3a 4275 6666
6572 2e62 7566 6665 725f 666c 7573 6828
2f75 7372 2f6c 6f63 616c 2f6e 6167 696f
736c 6f67 7365 7276 6572 2f6c 6f67 7374
6173 682f 7665 6e64 6f72 2f62 756e 646c
652f 6a72 7562 792f 312e 392f 6765 6d73
2f73 7475 642d 302e 302e 3139 2f6c 6962
2f73 7475 642f 6275 6666 6572 2e72 623a
3139 3329 222c 2022 5255 4259 2e62 7566
Thank you.
You do not have the required permissions to view the files attached to this post.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Cluster Stability issues after upgrading to 1.4.4

Post by cdienger »

The recommended memory for a production system is 8Gigs while it's likely you'll may need to add more. Can you increase the memory to 8 or 16Gigs as a test? While you may not need the max of 64, it's not unusual for us to see systems configured with 64 total. 64Gigs offers the ability to load and search as much data as possible.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Jklre
Posts: 163
Joined: Wed May 28, 2014 1:56 pm

Re: Cluster Stability issues after upgrading to 1.4.4

Post by Jklre »

cdienger wrote:The recommended memory for a production system is 8Gigs while it's likely you'll may need to add more. Can you increase the memory to 8 or 16Gigs as a test? While you may not need the max of 64, it's not unusual for us to see systems configured with 64 total. 64Gigs offers the ability to load and search as much data as possible.
right now we have 6gb per node. I can have it increased to 8gb. Or can we adjust the JVM settings since memory utilization is only at a constant 70% on this cluster?
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: Cluster Stability issues after upgrading to 1.4.4

Post by cdienger »

It's usually best to leave the JVM settings alone and elasticsearch will take care of allocating heap memory for itself. Th 70% marker reminds me of another case I've seen recently were the were memory problems with the merge option came in. You can disable this option under Administration > System > Backup & Maintenance > Optimize Indexes older than. Setting it to 0 will disable it and shouldn't have a much if any impact on performance.

If it crashes again after increasing the memory the first place I would look would be the most recent elasticsearch log in /var/log/elasticsearch as well as the java log again.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked