Page 1 of 3

LogServer's memory be exhausted

Posted: Tue May 10, 2016 5:51 am
by warrennspectrum
Hi,

I have installed NLS 1.4.0 in a virtual machine, the hardware setup shown as follows,

OS: CentOS6.7
CPU: 8core
Memory: 64GB
Disk: 500GB

when I apply the logstash config on the NLS, the filter work normally and the log data capacity is 16~17GB each day. However, about 3~4 days after, the memory will be exhausted and the logstash shut down. The memory be occupied persistently and do not return to the OS! Where is my memory???

Here is my filter config

Code: Select all

if [type] == 'asa'  {
grok {
  match => [
"message","%{GREEDYDATA}<%{POSINT:syslog_pri}>%{CISCOTIMESTAMP:LogTime}: \%%{USERNAME:ASA_id}: %{GREEDYDATA:ASA_msg}"]
     }
syslog_pri { }
grok {
  match => [
    "ASA_msg","%{GREEDYDATA:event} from %{DATA:ftp_src_name}:%{IPV4:ftp_src_ip}/%{INT:ftp_src_port} to %{DATA:ftp_dist_name}:%{IPV4:ftp_dist_ip}/%{INT:ftp_dist_port}, user %{USERNAME:ftp_username} %{GREEDYDATA:ftp_action} %{HOSTNAME:ftp_filename}",
    "ASA_msg","%{DATA:protocol} %{GREEDYDATA:event} from %{IPV4:src_ip} to %{DATA:dist_name}:%{IPV4:dist_ip}",
    "ASA_msg","%{DATA:protocol} %{GREEDYDATA:event} from %{IPV4:src_ip}/%{INT:src_port} to %{DATA:dist_name}:%{IPV4:dist_ip}/%{USERNAME:dist_port}",
#313005
    "ASA_msg","%{GREEDYDATA:event}: icmp src %{DATA:icmp_src_name}:%{IPV4:interface_icmp_src_ip} dst %{DATA:dist_name}:%{IPV4:interface_icmp_dist_ip} \(type %{INT:interface_icmp_type}, code %{INT:interface_icmp_code}\) on %{GREEDYDATA:interface_name} interface.\s+Original IP payload: %{GREEDYDATA:no_matching_msg}",
    "ASA_msg","%{GREEDYDATA:event} %{INT:conn_num} %{DATA:src_name}:%{IPV4:src_ip}/%{INT:src_port} to %{DATA:dist_name}:%{IPV4:dist_ip}/%{INT:dist_port} duration\s+%{INT:duration_hour}:%{INT:duration_minutes}:%{INT:duration_second} bytes %{INT:teardown_bytes} %{GREEDYDATA:teardown_msg}",
    "ASA_msg","%{GREEDYDATA:event} %{INT:conn_num} %{DATA:src_name}:%{IPV4:src_ip}/%{INT:src_port} to %{DATA:dist_name}:%{IPV4:dist_ip}/%{INT:dist_port} duration\s+%{INT:duration_hour}:%{INT:duration_minutes}:%{INT:duration_second} bytes %{INT:teardown_bytes}",
#106023
    "ASA_msg","%{USERNAME:action}\s%{USERNAME:deny_protocol}\ssrc\s%{USERNAME:src_name}\S%{IPV4:src_ip}\S%{INT:src_port}\sdst\s%{USERNAME:dist_name}\S%{IPV4:dist_ip}\S%{INT:dist_port}\sby access\Sgroup\s%{GREEDYDATA:acl_name}\s\S%{BASE16NUM:acl_type}\S\s%{BASE16NUM:acl_code}\S",
    "ASA_msg","%{GREEDYDATA:event} %{INT:conn_num} for %{USERNAME:src_name}:%{IPV4:src_ip}/%{INT:src_port} \(%{IPV4:src_ip_sec}/%{INT:src_port_sec}\) to %{USERNAME:dist_name}:%{IPV4:dist_ip}/%{INT:dist_port} \(%{IPV4:dist_ip_sec}/%{INT:dist_port_sec}\)\n",
    "ASA_msg","%{GREEDYDATA:event} type=%{INT:ICMP_type}, code=%{INT:ICMP_code} from %{IPV4:src_ip} on %{GREEDYDATA:src_name}",
    "ASA_msg","%{GREEDYDATA:event} %{INT:conn_num} from %{USERNAME:src_name}:%{IPV4:src_ip}/%{INT:src_port} \(%{IPV4:src_ip_sec}/%{INT:src_port_sec}\) to %{USERNAME:dist_name}:%{IPV4:dist_ip}/%{INT:dist_port} \(%{IPV4:dist_ip_sec} \/%{INT:dist_port_sec}\)",
    "ASA_msg","%{GREEDYDATA:event} %{USERNAME:dist_name}:%{IPV4:dist_ip}",
    "ASA_msg","%{GREEDYDATA:event} for faddr %{IPV4:faddr_ip}/%{INT:faddr_port} gaddr %{IPV4:gaddr_ip}/%{INT:gaddr_port} laddr %{IPV4:laddr_ip}/%{INT:laddr_port}",
    "ASA_msg","\[%{GREEDYDATA:object_of_drop_rate}\] drop %{USERNAME:rate_ID} exceeded. Current burst rate is %{INT:current_burst_rate} per second, max configured rate is %{INT:max_burst_configured_rate}; Current average rate is %{INT:current_average_rate} per second, max configured rate is %{INT:max_average_configured_rate}; Cumulative total count is %{INT:cumulative_total_count}",
    "ASA_msg","%{GREEDYDATA:event} from %{IPV4:src_ip}/%{INT:src_port} to %{IPV4:dist_ip}/%{INT:dist_port} flags %{DATA:tcp_flags}  on interface %{USERNAME:interface_name}",
    "ASA_msg","%{DATA:protocol} %{GREEDYDATA:event} from %{IPV4:src_ip}/%{INT:src_port} to %{DATA:dist_name}:%{IPV4:dist_ip}/%{DATA:service}",
#419003
    "ASA_msg","%{GREEDYDATA:event} from %{USERNAME:src_name}:%{IPV4:src_ip}/%{INT:src_port} to %{USERNAME:dist_name}:%{IPV4:sdist_ip}/%{INT:dist_port}",
    "no_matching_msg","icmp src %{IPV4:Original_icmp_src_ip} dst %{IPV4:Original_icmp_dist_ip} \(type %{INT:Original_icmp_type}, code %{INT:Original_icmp_code}\).",
    "no_matching_msg","tcp src %{IPV4:Original_tcp_src_ip}/%{INT:Original_tcp_src_port} dst\s%{IPV4:Original_tcp_dist_ip}/%{INT:Original_tcp_dist_port}"
    ]			
    	add_field => {"analyze" => "ANXX"}	
    	     } 
    }

Code: Select all

    if [type] == 'cisco' {
    	grok {
    		match => [ "message","<%{POSINT:syslog_pri}>%{INT:ID}:(\ )*(.)?%{CISCOTIMESTAMP:LogTime}: \%%{USERNAME:log_type}: %{GREEDYDATA:Cisco_msg}"]
    		}
    	syslog_pri { }
    	
    	grok {
    		match => ["Cisco_msg","%{DATA:protocol}( protocol on )*Interface %{DATA:Interface}, changed state to %{GREEDYDATA:state}",
    				"Cisco_msg","%{GREEDYDATA:Result} on %{WORD:port_type} port, message type: %{WORD:msg_type}, MAC sa: %{CISCOMAC:MAC}",
    				"Cisco_msg","%{GREEDYDATA:Result} because %{GREEDYDATA:Result_reason}, message type: %{WORD:msg_type}, chaddr: %{URIHOST:chaddr}, MAC sa: %{CISCOMAC:MAC}",
    				"Cisco_msg","%{GREEDYDATA:Result} %{URIHOST:src_ip} on %{WORD:lan}, sourced by %{CISCOMAC:MAC}",
    				"Cisco_msg","Stack Port %{DATA:stack_port} Switch %{DATA:stack_switch} has changed to state %{GREEDYDATA:state}",
    				"Cisco_msg","%{GREEDYDATA:Result} from %{WORD:protocol} by %{USER:user} on %{DATA:lan} \(%{IP:src_ip}\)",
    				"Cisco_msg","User:%{USER:user}  logged %{GREEDYDATA:Result}:%{GREEDYDATA:Result_content}",
    				"Cisco_msg","%{DATA:Result} %{DATA:state} \[user: %{USER:user}\] \[Source: %{URIHOST:src_ip}\] \[localport: %{INT:port}\] at %{TIME:time} %{WORD:zone} %{WORD:week} %{WORD:month} %{INT:day} %{INT:year}",
                                 "Cisco_msg","%{GREEDYDATA:event} %{IPV4:dist_ip} port %{INT:dist_port} %{GREEDYDATA:login_state}",
                                 "Cisco_msg","%{GREEDYDATA:event} %{IPV4:src_ip}\(%{INT:src_port}\) \S\S %{IPV4:dist_ip}\(%{INT:dist_port}\), %{INT:packets} packets",
                                 "Cisco_msg","%{GREEDYDATA:event} from \(%{URI:src_uri}\) %{GREEDYDATA:state}",
                                 "Cisco_msg","Login %{DATA:login_status}\s\Suser: %{USERNAME:user}\S\s\SSource: %{IPV4:source_ip}\S\s\Slocalport: %{INT:localport}\S\s\SReason: %{GREEDYDATA}\S\sat %{TIME:login_time} %{TZ:login_tz} %{DAY:login_day} %{MONTH:loging_month} %{MONTHDAY:login_monthday} %{YEAR:login_year}"
    				]				
    		add_field => {
    			"analyze" => "Y"
                }	
    	  }
    	   if[Interface] {
    		mutate {
    			add_field => {
    				"Result" => "Interface changed state"
    				}
    		}
    	  }
    	  if[stack_port] {
    		mutate {
    			add_field => {
    				"Result" => "Stack changed state"
    				}
    		}
    	  }
    }

Code: Select all

    if [type] == 'f5' {
    	grok {
    		match => [ "message","<%{POSINT:syslog_pri}>%{CISCOTIMESTAMP:LogTime} %{DATA:host_name} %{DATA:log_type}: \[%{DATA:Result_content}\]\[%{INT:day}/%{WORD:month}/%{INT:year}:%{TIME:time} %{DATA:zone}\] %{GREEDYDATA:F5_msg} %{INT:msg_ID0}",
    		"message","<%{POSINT:syslog_pri}>%{CISCOTIMESTAMP:LogTime} %{DATA:host_name} %{DATA:log_type}: \[%{DATA:Result}\] %{URIHOST:ip} - %{USER:user} \[%{INT:day}/%{WORD:month}/%{INT:year}:%{TIME:time} %{DATA:zone}\] %{GREEDYDATA:F5_msg} %{INT:msg_ID0} %{INT:msg_ID1}",
    					"message","<%{POSINT:syslog_pri}>%{CISCOTIMESTAMP:LogTime} %{USERNAME:msg_type} %{DATA:log_type}: %{GREEDYDATA:F5_msg1}"
    			]
    		}
    	syslog_pri { }
    	grok {
    		match => [
    					"F5_msg","%{URIHOST:ip} %{DATA:msg_type} %{DATA:Result} \"%{GREEDYDATA:Connect_type}\"",
    					"F5_msg","%{DATA:msg_ID0}:%{DATA:msg_ID1}:%{DATA:Result} instance %{DATA:Result_content} %{URIHOST:mac} %{DATA:state_ini} --\> %{DATA:state_to} from %{URIHOST:from_ip} \(state: %{DATA:state}\)",
    				  "F5_msg","\"%{GREEDYDATA:Result_content}\"",
    					"F5_msg1","%{INT:year}-%{WORD:month}-%{INT:day} %{TIME:time} %{URIHOST:uri} from %{URIHOST:src_ip}#%{DATA:src_port}: view %{DATA:view}: query: %{DATA:query_uri} IN %{GREEDYDATA:query_in} \(%{URIHOST:ip}\%%{INT}\)",
    				  "F5_msg1","%{INT:year}-%{WORD:month}-%{INT:day} %{TIME:time} %{URIHOST:uri} to %{URIHOST:dist_ip}#%{DATA:dist_port}: \[%{DATA:Result_content} %{DATA:Result_content1}\] response: (%{DATA:response}|%{URIHOST:response_uri} %{INT:response_port} IN %{DATA:response_in} %{URIHOST:response_uri1};)",
    				  "F5_msg1","%{DATA:Result_content} %{DATA:Result} while %{GREEDYDATA:Result_reason}; fd='%{DATA:fd}'",
    				  "F5_msg1","%{GREEDYDATA:Result}; time_reopen='%{DATA:time_reopen}'",
    				  "F5_msg1","Rule %{DATA:Rule}: %{GREEDYDATA:Result} %{IPV4:client_ip}:%{INT:client_port} %{DATA:msg_type} %{DATA:Result_content} to %{IPV4:src_ip}:%{INT:src_port} Sent to %{IPV4:dist_ip}:%{INT:dist_port}",
    				  "F5_msg1","%{DATA:msg_ID0}:%{DATA:msg_ID1}: HA vlan_fs %{DATA:HA_vlan_fs} is( now)? %{DATA:Result}.",
    				  "F5_msg1","%{DATA:msg_ID0}:%{DATA:msg_ID1}: HA vlan_fs %{DATA:HA_vlan_fs} %{DATA:Result_reason} action is %{DATA:Result}.",	
    				  "F5_msg1","%{DATA:msg_ID0}:%{DATA:msg_ID1}: Failover condition reported by %{DATA:reported_by} for traffic group %{DATA:traffic_group}, %{GREEDYDATA:Result_content}.",
    				  "F5_msg1","\(%{USERNAME:user}\) %{DATA:Result} \(mail%{DATA:mail_Result} %{INT:bytes} bytes of output but got status %{DATA:state}\)",				  
    				  "F5_msg1","\(%{USERNAME:user}\) %{DATA:Result} \(%{GREEDYDATA:Result_content}\)",
    				  "F5_msg1","%{DATA:Result_content}\(%{DATA:Result_content1}\):%{GREEDYDATA:Result} for user %{USERNAME:user}( by \(uid=%{DATA:uid}\))*",
    				  "F5_msg1","USER %{USERNAME:user} pid %{DATA:pid} %{DATA:Result} %{GREEDYDATA:Result_content}",
    				  "F5_msg1","%{GREEDYDATA:Result} '' %{GREEDYDATA:Result_reason}",
    				  "F5_msg1","pendsect: %{DATA:pendsect_disk} %{GREEDYDATA:Result}",
    				  "F5_msg1","%{GREEDYDATA:Result} request received, %{GREEDYDATA:Result_content};",
    				  "F5_msg1","%{USER:user} daemon %{GREEDYDATA:Result}",
    				  "F5_msg1","%{DATA:Result} to %{URIHOST:dist_ip}, stratum %{INT:stratum}",
    				  "F5_msg1","%{GREEDYDATA:Result}:%{INT:Result_ID}"				  
    				]				
    		add_field => {
    			"F5_msg" => "%{F5_msg1}"
    			"analyze" => "Y"
                }
    		remove_field => ["F5_msg1"]
    	  }
    	  if[query_uri]{
    		mutate {
    			add_field => {
    				"Result" => "Uri from IP by Query"
    				}
    		}
    	  }
    	  if[response] or [response_uri] {
    		mutate {
    			add_field => {
    				"Result" => "Uri to ip and the response"
    				}
    		}
    	  }	  
    	   if[Result_content]{
    		mutate {
    			add_field => {
    				"Result" => "Failover condition reported"
    				}
    		}
    	  }	  
    	  if[mail_Result] {
    		mutate{
    			add_field => {
    				"Result_content" => "Mailed output but got status"
    				}
    			remove_field => ["mail_Result"]
    		}
    	  }	  
    	  if[response] {
    		mutate{
    			add_field => {
    				"Result" => "Response"
    				}
    		}
    	  }
    	   if[reset_s] {
    		mutate{
    			add_field => {
    				"Result" => "Time reset"
    				}
    		}
    	  }
    }

Re: LogServer's memory be exhausted

Posted: Tue May 10, 2016 7:19 am
by eloyd
64GB is a lot of memory, even for a NLS installation. However, you're in luck. By default, NLS allocates 50% of available system RAM to elasticsearch. You can verify this in /etc/sysconfig/elasticsearch. It MAY be worthwhile to increase the ES_HEAP_SIZE to something bigger than 50% of RAM. You can test this out by changing the last line in this code:

Code: Select all

# Heap Size (defaults to 256m min, 1g max)
# Nagios Log Server Default to 0.5 physical Memory
ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 )m
to

Code: Select all

ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2 * 2}') / 3 )m
Which would allocate 2/3 of your RAM (instead of 1/2) to the JVM heap after you restarted elasticsearch.

HOWEVER, on a machine with more than 32GB of RAM allocated to JVM heap, you may fall victim to a problem outlined in this article where allocating more than 32GB to the heap can actually cause problems. You may want to experiment to see what gives you the best overall results.

Re: LogServer's memory be exhausted

Posted: Tue May 10, 2016 9:41 am
by hsmith
Also, what's a free -m command look like?

Re: LogServer's memory be exhausted

Posted: Tue May 10, 2016 9:42 am
by rkennedy
Thanks @eloyd!

Logstash will cache your logs to memory, which is how it's able to search through them so quickly. Take a look at your Backup & Maintenance page and post a screenshot of it for us to look at.

You'll most likely want to adjust the 'close indexes older than' parameter, as that will take them out of memory, but still leave them on the disk to be reopened.

Re: LogServer's memory be exhausted

Posted: Wed May 11, 2016 3:59 am
by warrennspectrum
Thanks @eloyd, @hsmith, and @rkennedy

I doubt that there is a situation maybe lead to my memory problem. I installed the NLS in the virtual machine of my computer, and the setup of CPU core and memory is 1 core and 2GB. I export this virtual machine to a .ova file, and import it into another vSphere. Change the CPU core and memory to 8 cores and 64GB. Is it possible that this situation is the result? If yes, how do I fix it?

To @eloyd,
In fact, there is 256GB memory we could use, and we used all originally. The exhausted memory problem lead to logstash shut down. I found this https://www.elastic.co/guide/en/elastic ... izing.html and I thought it's the result of the memory problem. So I setup the memory to 64GB, give ES_HEAP_SIZE half of 64GB memory, the problem still exsist.

To @hsmith,
Here is the free memory -h command output,
Image
Affter I shut down the logstash by myself, the output shown as follow,
Image
I think its all logstash's fault but it seem incorrect. It's look like all the memory move to the cache. Should I input the command
echo 1 > /proc/sys/vm/drop_caches
to release the cache memory?

To @rkennedy
Here is the screenshot of the Backup & Maintenance page,
Image
So how do I adjust this page's parameter to solve the memory problem. I really hope the logstah could run steady and never stop by itself. :cry:

Thanks you all again! :)

Re: LogServer's memory be exhausted

Posted: Wed May 11, 2016 9:57 am
by hsmith
Change your settings away from 2, 0, 0. Try to set them to 2, 15, 30. See if that helps anything out. Next, let's give this command a shot:

Code: Select all

sed -i s/'#LS_HEAP_SIZE="256m"'/'LS_HEAP_SIZE="2048m"'/g /etc/sysconfig/logstash

service logstash restart
Give this a shot and let me know what happens.

Thanks!

Re: LogServer's memory be exhausted

Posted: Wed May 11, 2016 10:09 am
by eloyd
Sorry, I read this originally as elasticsearch was stopping, not logstash. Yes, you will want to increase LS_HEAP_SIZE to something more reasonable, given your utilization. And you will definitely want to do what @hsmith said about 2 15 30 for the index management.

Also, for fun, can you confirm that you're running the proper version of logstash:

Code: Select all

/usr/local/nagioslogserver/logstash/bin/logstash --version
(Note, this can take a while, since it needs to fire up a full JVM to process.)

I'm expecting 1.5.1

Re: LogServer's memory be exhausted

Posted: Wed May 11, 2016 4:14 pm
by hsmith
Thanks Eric :) (Again)

Re: LogServer's memory be exhausted

Posted: Wed May 11, 2016 10:34 pm
by warrennspectrum
Thanks @hsmith and @eloyd,

I already changed my Backup & Maintenance page settings to 2, 15, 30, and set up LS_HEAP_SIZE to 2048m and restart logstash.

In fact, I updated my NLS version from 1.4.0 to 1.4.1 and rebooted my server yesterday, the memory cache number was about 1GB. Now, when I login to the server, the output of free -h commnd is
Image

The cache memory add to 15GB, I afraid of maybe tomorrow the logstash will be shut down again because of insufficient memory:?

To @eloyd, here is my logstash version
Image
It's 1.5.1 you expecting. :D
and I found the NLS Administration page has the version information, too. Here is the screenshot
Image

By the way, the data volume is about 18GB one day.

Re: LogServer's memory be exhausted

Posted: Thu May 12, 2016 9:01 am
by warrennspectrum
About 6 hours later the
free -h
command output is shown below, I guess tomorrow the logstash will be shut down by itself again :?