Nagios user java command using over 200% CPU

scottwilkerson · Post by **scottwilkerson** » Tue Apr 02, 2019 7:05 am

I don't have a document but I'll say this, each member of a cluster shares in the load of indexing messages when they are received and finding the documents when you are searching for logs. If you have 2 separate 2 instance clusters, each time you do either of those things you are limited to the resources on just the 2 instances in that cluster instead of the resources of all 4.

Additionally, log messages tend to be bursty, whereas there are times that certain segments of your log traffic will send a lot more than the average amount of logs, and being able to have the resources of all 4 instances to process the messages will be beneficial, additionally, the times that the burst happen are likely going to be different for each of the 2 segments you are proposing splitting these into.

This is above and beyond the additional higher availability you will receive in case of an outage.

Full disclosure: I am one of the authors of Nagios Log Server, there is a demonstration of what happens to the cluster in the event of a server outage at about the 26 minute point of my Log Server product launch video
https://www.youtube.com/watch?v=S_B5AJ-xvWs

rferebee · Post by **rferebee** » Tue Apr 02, 2019 9:55 am

I noticed this morning that for some reason I have 42 indices open when I shouldn't have more than 31. It's causing the drive space on my servers to fill up completely. Is there a way I can force Log Server to close the indices that shouldn't be open based on my Snapshot & Maintenance settings?

scottwilkerson · Post by **scottwilkerson** » Tue Apr 02, 2019 10:18 am

You can go to Admin -> Command Subsystem and run the snapshots_maintenance job

One thing to note, make sure you don't have indexes with future dates, these would not be removed, and this can happen if you have a machine with an incorrect date sending logs dated in the future

rferebee · Post by **rferebee** » Tue Apr 02, 2019 10:21 am

So, it says there is a snapshots_maintenance job currently running. Could that be what has the older indexes open? They're going all the way back to the last week of February.

scottwilkerson · Post by **scottwilkerson** » Tue Apr 02, 2019 10:39 am

what is listed in the "Last Run Time" ?

Also, can you run the following from the CLI and show the output?

Code: Select all

ps -ef|grep curator

rferebee · Post by **rferebee** » Tue Apr 02, 2019 10:48 am

Last run time is listed as 3/31/2019.

ps -ef|grep curator shows this:

[root@nagioslscc1 ~]# ps -ef|grep curator
root 13757 13741 0 08:47 pts/0 00:00:00 grep curator
nagios 21154 21145 0 Apr01 ? 00:00:00 /bin/sh /usr/local/nagioslogserver/scripts/curator.sh optimize indices --older-than 20 --time-unit days --timestring %Y.%m.%d
nagios 21161 21154 0 Apr01 ? 00:00:00 /usr/bin/python /usr/bin/curator optimize indices --older-than 20 --time-unit days --timestring %Y.%m.%d
[root@nagioslscc1 ~]#

[root@nagioslscc2 ~]# ps -ef|grep curator
root 4496 4482 0 08:47 pts/0 00:00:00 grep curator
[root@nagioslscc2 ~]#

scottwilkerson · Post by **scottwilkerson** » Tue Apr 02, 2019 11:07 am

something isn't right, it shouldn't run this long.

Lets kill those processes

Code: Select all

kill 21154
kill 21161

then go to Admin -> Command Subsystem
click "Reset All Jobs"
then click run next to snapshots_maintenance

rferebee · Post by **rferebee** » Tue Apr 02, 2019 11:16 am

When I closed the Optimization tasks it closed the old indices and began running the Snapshot on its own. I think it's working as designed now.

I really wish the optimization worked without hanging. I tried setting it to 0 a couple months ago and our storage started filling up really quickly. I'll have to set it back to 14, I think that was the sweet spot.

scottwilkerson · Post by **scottwilkerson** » Tue Apr 02, 2019 1:29 pm

rferebee wrote:I'll have to set it back to 14, I think that was the sweet spot.

that sounds good

rferebee · Post by **rferebee** » Wed Apr 03, 2019 10:44 am

Good morning, what is best practice for restarting Log Server? I can never get it to come back up when I attempt to restart the services, perhaps I'm doing it incorrectly.

I am completely locked out of my instance and it doesn't appear that Log Server is doing anything, the CPU is only about 40% right now. I can't get the GUI to come up. I'd rather not reboot the servers and have to load the indices again...

Nagios Support Forum

Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU

Re: Nagios user java command using over 200% CPU