Nagios user java command using over 200% CPU

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios user java command using over 200% CPU

Post by scottwilkerson »

I don't have a document but I'll say this, each member of a cluster shares in the load of indexing messages when they are received and finding the documents when you are searching for logs. If you have 2 separate 2 instance clusters, each time you do either of those things you are limited to the resources on just the 2 instances in that cluster instead of the resources of all 4.

Additionally, log messages tend to be bursty, whereas there are times that certain segments of your log traffic will send a lot more than the average amount of logs, and being able to have the resources of all 4 instances to process the messages will be beneficial, additionally, the times that the burst happen are likely going to be different for each of the 2 segments you are proposing splitting these into.

This is above and beyond the additional higher availability you will receive in case of an outage.

Full disclosure: I am one of the authors of Nagios Log Server, there is a demonstration of what happens to the cluster in the event of a server outage at about the 26 minute point of my Log Server product launch video
https://www.youtube.com/watch?v=S_B5AJ-xvWs
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Nagios user java command using over 200% CPU

Post by rferebee »

I noticed this morning that for some reason I have 42 indices open when I shouldn't have more than 31. It's causing the drive space on my servers to fill up completely. Is there a way I can force Log Server to close the indices that shouldn't be open based on my Snapshot & Maintenance settings?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios user java command using over 200% CPU

Post by scottwilkerson »

You can go to Admin -> Command Subsystem and run the snapshots_maintenance job

One thing to note, make sure you don't have indexes with future dates, these would not be removed, and this can happen if you have a machine with an incorrect date sending logs dated in the future
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Nagios user java command using over 200% CPU

Post by rferebee »

So, it says there is a snapshots_maintenance job currently running. Could that be what has the older indexes open? They're going all the way back to the last week of February.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios user java command using over 200% CPU

Post by scottwilkerson »

what is listed in the "Last Run Time" ?

Also, can you run the following from the CLI and show the output?

Code: Select all

ps -ef|grep curator
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Nagios user java command using over 200% CPU

Post by rferebee »

Last run time is listed as 3/31/2019.

ps -ef|grep curator shows this:

[root@nagioslscc1 ~]# ps -ef|grep curator
root 13757 13741 0 08:47 pts/0 00:00:00 grep curator
nagios 21154 21145 0 Apr01 ? 00:00:00 /bin/sh /usr/local/nagioslogserver/scripts/curator.sh optimize indices --older-than 20 --time-unit days --timestring %Y.%m.%d
nagios 21161 21154 0 Apr01 ? 00:00:00 /usr/bin/python /usr/bin/curator optimize indices --older-than 20 --time-unit days --timestring %Y.%m.%d
[root@nagioslscc1 ~]#

[root@nagioslscc2 ~]# ps -ef|grep curator
root 4496 4482 0 08:47 pts/0 00:00:00 grep curator
[root@nagioslscc2 ~]#
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios user java command using over 200% CPU

Post by scottwilkerson »

something isn't right, it shouldn't run this long.

Lets kill those processes

Code: Select all

kill 21154
kill 21161
then go to Admin -> Command Subsystem
click "Reset All Jobs"
then click run next to snapshots_maintenance
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Nagios user java command using over 200% CPU

Post by rferebee »

When I closed the Optimization tasks it closed the old indices and began running the Snapshot on its own. I think it's working as designed now.

I really wish the optimization worked without hanging. I tried setting it to 0 a couple months ago and our storage started filling up really quickly. I'll have to set it back to 14, I think that was the sweet spot.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios user java command using over 200% CPU

Post by scottwilkerson »

rferebee wrote:I'll have to set it back to 14, I think that was the sweet spot.
that sounds good
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
rferebee
Posts: 733
Joined: Wed Jul 11, 2018 11:37 am

Re: Nagios user java command using over 200% CPU

Post by rferebee »

Good morning, what is best practice for restarting Log Server? I can never get it to come back up when I attempt to restart the services, perhaps I'm doing it incorrectly.

I am completely locked out of my instance and it doesn't appear that Log Server is doing anything, the CPU is only about 40% right now. I can't get the GUI to come up. I'd rather not reboot the servers and have to load the indices again...
Locked