Nagios Support Forum

Posted: **Tue May 10, 2016 5:51 am**

Hi,

I have installed NLS 1.4.0 in a virtual machine, the hardware setup shown as follows,

OS: CentOS6.7
CPU: 8core
Memory: 64GB
Disk: 500GB

when I apply the logstash config on the NLS, the filter work normally and the log data capacity is 16~17GB each day. However, about 3~4 days after, the memory will be exhausted and the logstash shut down. The memory be occupied persistently and do not return to the OS! Where is my memory???

Here is my filter config

Code: Select all

if [type] == 'asa'  {
grok {
  match => [
"message","%{GREEDYDATA}<%{POSINT:syslog_pri}>%{CISCOTIMESTAMP:LogTime}: \%%{USERNAME:ASA_id}: %{GREEDYDATA:ASA_msg}"]
     }
syslog_pri { }
grok {
  match => [
    "ASA_msg","%{GREEDYDATA:event} from %{DATA:ftp_src_name}:%{IPV4:ftp_src_ip}/%{INT:ftp_src_port} to %{DATA:ftp_dist_name}:%{IPV4:ftp_dist_ip}/%{INT:ftp_dist_port}, user %{USERNAME:ftp_username} %{GREEDYDATA:ftp_action} %{HOSTNAME:ftp_filename}",
    "ASA_msg","%{DATA:protocol} %{GREEDYDATA:event} from %{IPV4:src_ip} to %{DATA:dist_name}:%{IPV4:dist_ip}",
    "ASA_msg","%{DATA:protocol} %{GREEDYDATA:event} from %{IPV4:src_ip}/%{INT:src_port} to %{DATA:dist_name}:%{IPV4:dist_ip}/%{USERNAME:dist_port}",
#313005
    "ASA_msg","%{GREEDYDATA:event}: icmp src %{DATA:icmp_src_name}:%{IPV4:interface_icmp_src_ip} dst %{DATA:dist_name}:%{IPV4:interface_icmp_dist_ip} \(type %{INT:interface_icmp_type}, code %{INT:interface_icmp_code}\) on %{GREEDYDATA:interface_name} interface.\s+Original IP payload: %{GREEDYDATA:no_matching_msg}",
    "ASA_msg","%{GREEDYDATA:event} %{INT:conn_num} %{DATA:src_name}:%{IPV4:src_ip}/%{INT:src_port} to %{DATA:dist_name}:%{IPV4:dist_ip}/%{INT:dist_port} duration\s+%{INT:duration_hour}:%{INT:duration_minutes}:%{INT:duration_second} bytes %{INT:teardown_bytes} %{GREEDYDATA:teardown_msg}",
    "ASA_msg","%{GREEDYDATA:event} %{INT:conn_num} %{DATA:src_name}:%{IPV4:src_ip}/%{INT:src_port} to %{DATA:dist_name}:%{IPV4:dist_ip}/%{INT:dist_port} duration\s+%{INT:duration_hour}:%{INT:duration_minutes}:%{INT:duration_second} bytes %{INT:teardown_bytes}",
#106023
    "ASA_msg","%{USERNAME:action}\s%{USERNAME:deny_protocol}\ssrc\s%{USERNAME:src_name}\S%{IPV4:src_ip}\S%{INT:src_port}\sdst\s%{USERNAME:dist_name}\S%{IPV4:dist_ip}\S%{INT:dist_port}\sby access\Sgroup\s%{GREEDYDATA:acl_name}\s\S%{BASE16NUM:acl_type}\S\s%{BASE16NUM:acl_code}\S",
    "ASA_msg","%{GREEDYDATA:event} %{INT:conn_num} for %{USERNAME:src_name}:%{IPV4:src_ip}/%{INT:src_port} \(%{IPV4:src_ip_sec}/%{INT:src_port_sec}\) to %{USERNAME:dist_name}:%{IPV4:dist_ip}/%{INT:dist_port} \(%{IPV4:dist_ip_sec}/%{INT:dist_port_sec}\)\n",
    "ASA_msg","%{GREEDYDATA:event} type=%{INT:ICMP_type}, code=%{INT:ICMP_code} from %{IPV4:src_ip} on %{GREEDYDATA:src_name}",
    "ASA_msg","%{GREEDYDATA:event} %{INT:conn_num} from %{USERNAME:src_name}:%{IPV4:src_ip}/%{INT:src_port} \(%{IPV4:src_ip_sec}/%{INT:src_port_sec}\) to %{USERNAME:dist_name}:%{IPV4:dist_ip}/%{INT:dist_port} \(%{IPV4:dist_ip_sec} \/%{INT:dist_port_sec}\)",
    "ASA_msg","%{GREEDYDATA:event} %{USERNAME:dist_name}:%{IPV4:dist_ip}",
    "ASA_msg","%{GREEDYDATA:event} for faddr %{IPV4:faddr_ip}/%{INT:faddr_port} gaddr %{IPV4:gaddr_ip}/%{INT:gaddr_port} laddr %{IPV4:laddr_ip}/%{INT:laddr_port}",
    "ASA_msg","\[%{GREEDYDATA:object_of_drop_rate}\] drop %{USERNAME:rate_ID} exceeded. Current burst rate is %{INT:current_burst_rate} per second, max configured rate is %{INT:max_burst_configured_rate}; Current average rate is %{INT:current_average_rate} per second, max configured rate is %{INT:max_average_configured_rate}; Cumulative total count is %{INT:cumulative_total_count}",
    "ASA_msg","%{GREEDYDATA:event} from %{IPV4:src_ip}/%{INT:src_port} to %{IPV4:dist_ip}/%{INT:dist_port} flags %{DATA:tcp_flags}  on interface %{USERNAME:interface_name}",
    "ASA_msg","%{DATA:protocol} %{GREEDYDATA:event} from %{IPV4:src_ip}/%{INT:src_port} to %{DATA:dist_name}:%{IPV4:dist_ip}/%{DATA:service}",
#419003
    "ASA_msg","%{GREEDYDATA:event} from %{USERNAME:src_name}:%{IPV4:src_ip}/%{INT:src_port} to %{USERNAME:dist_name}:%{IPV4:sdist_ip}/%{INT:dist_port}",
    "no_matching_msg","icmp src %{IPV4:Original_icmp_src_ip} dst %{IPV4:Original_icmp_dist_ip} \(type %{INT:Original_icmp_type}, code %{INT:Original_icmp_code}\).",
    "no_matching_msg","tcp src %{IPV4:Original_tcp_src_ip}/%{INT:Original_tcp_src_port} dst\s%{IPV4:Original_tcp_dist_ip}/%{INT:Original_tcp_dist_port}"
    ]			
    	add_field => {"analyze" => "ANXX"}	
    	     } 
    }

Code: Select all

    if [type] == 'cisco' {
    	grok {
    		match => [ "message","<%{POSINT:syslog_pri}>%{INT:ID}:(\ )*(.)?%{CISCOTIMESTAMP:LogTime}: \%%{USERNAME:log_type}: %{GREEDYDATA:Cisco_msg}"]
    		}
    	syslog_pri { }
    	
    	grok {
    		match => ["Cisco_msg","%{DATA:protocol}( protocol on )*Interface %{DATA:Interface}, changed state to %{GREEDYDATA:state}",
    				"Cisco_msg","%{GREEDYDATA:Result} on %{WORD:port_type} port, message type: %{WORD:msg_type}, MAC sa: %{CISCOMAC:MAC}",
    				"Cisco_msg","%{GREEDYDATA:Result} because %{GREEDYDATA:Result_reason}, message type: %{WORD:msg_type}, chaddr: %{URIHOST:chaddr}, MAC sa: %{CISCOMAC:MAC}",
    				"Cisco_msg","%{GREEDYDATA:Result} %{URIHOST:src_ip} on %{WORD:lan}, sourced by %{CISCOMAC:MAC}",
    				"Cisco_msg","Stack Port %{DATA:stack_port} Switch %{DATA:stack_switch} has changed to state %{GREEDYDATA:state}",
    				"Cisco_msg","%{GREEDYDATA:Result} from %{WORD:protocol} by %{USER:user} on %{DATA:lan} \(%{IP:src_ip}\)",
    				"Cisco_msg","User:%{USER:user}  logged %{GREEDYDATA:Result}:%{GREEDYDATA:Result_content}",
    				"Cisco_msg","%{DATA:Result} %{DATA:state} \[user: %{USER:user}\] \[Source: %{URIHOST:src_ip}\] \[localport: %{INT:port}\] at %{TIME:time} %{WORD:zone} %{WORD:week} %{WORD:month} %{INT:day} %{INT:year}",
                                 "Cisco_msg","%{GREEDYDATA:event} %{IPV4:dist_ip} port %{INT:dist_port} %{GREEDYDATA:login_state}",
                                 "Cisco_msg","%{GREEDYDATA:event} %{IPV4:src_ip}\(%{INT:src_port}\) \S\S %{IPV4:dist_ip}\(%{INT:dist_port}\), %{INT:packets} packets",
                                 "Cisco_msg","%{GREEDYDATA:event} from \(%{URI:src_uri}\) %{GREEDYDATA:state}",
                                 "Cisco_msg","Login %{DATA:login_status}\s\Suser: %{USERNAME:user}\S\s\SSource: %{IPV4:source_ip}\S\s\Slocalport: %{INT:localport}\S\s\SReason: %{GREEDYDATA}\S\sat %{TIME:login_time} %{TZ:login_tz} %{DAY:login_day} %{MONTH:loging_month} %{MONTHDAY:login_monthday} %{YEAR:login_year}"
    				]				
    		add_field => {
    			"analyze" => "Y"
                }	
    	  }
    	   if[Interface] {
    		mutate {
    			add_field => {
    				"Result" => "Interface changed state"
    				}
    		}
    	  }
    	  if[stack_port] {
    		mutate {
    			add_field => {
    				"Result" => "Stack changed state"
    				}
    		}
    	  }
    }

Code: Select all

    if [type] == 'f5' {
    	grok {
    		match => [ "message","<%{POSINT:syslog_pri}>%{CISCOTIMESTAMP:LogTime} %{DATA:host_name} %{DATA:log_type}: \[%{DATA:Result_content}\]\[%{INT:day}/%{WORD:month}/%{INT:year}:%{TIME:time} %{DATA:zone}\] %{GREEDYDATA:F5_msg} %{INT:msg_ID0}",
    		"message","<%{POSINT:syslog_pri}>%{CISCOTIMESTAMP:LogTime} %{DATA:host_name} %{DATA:log_type}: \[%{DATA:Result}\] %{URIHOST:ip} - %{USER:user} \[%{INT:day}/%{WORD:month}/%{INT:year}:%{TIME:time} %{DATA:zone}\] %{GREEDYDATA:F5_msg} %{INT:msg_ID0} %{INT:msg_ID1}",
    					"message","<%{POSINT:syslog_pri}>%{CISCOTIMESTAMP:LogTime} %{USERNAME:msg_type} %{DATA:log_type}: %{GREEDYDATA:F5_msg1}"
    			]
    		}
    	syslog_pri { }
    	grok {
    		match => [
    					"F5_msg","%{URIHOST:ip} %{DATA:msg_type} %{DATA:Result} \"%{GREEDYDATA:Connect_type}\"",
    					"F5_msg","%{DATA:msg_ID0}:%{DATA:msg_ID1}:%{DATA:Result} instance %{DATA:Result_content} %{URIHOST:mac} %{DATA:state_ini} --\> %{DATA:state_to} from %{URIHOST:from_ip} \(state: %{DATA:state}\)",
    				  "F5_msg","\"%{GREEDYDATA:Result_content}\"",
    					"F5_msg1","%{INT:year}-%{WORD:month}-%{INT:day} %{TIME:time} %{URIHOST:uri} from %{URIHOST:src_ip}#%{DATA:src_port}: view %{DATA:view}: query: %{DATA:query_uri} IN %{GREEDYDATA:query_in} \(%{URIHOST:ip}\%%{INT}\)",
    				  "F5_msg1","%{INT:year}-%{WORD:month}-%{INT:day} %{TIME:time} %{URIHOST:uri} to %{URIHOST:dist_ip}#%{DATA:dist_port}: \[%{DATA:Result_content} %{DATA:Result_content1}\] response: (%{DATA:response}|%{URIHOST:response_uri} %{INT:response_port} IN %{DATA:response_in} %{URIHOST:response_uri1};)",
    				  "F5_msg1","%{DATA:Result_content} %{DATA:Result} while %{GREEDYDATA:Result_reason}; fd='%{DATA:fd}'",
    				  "F5_msg1","%{GREEDYDATA:Result}; time_reopen='%{DATA:time_reopen}'",
    				  "F5_msg1","Rule %{DATA:Rule}: %{GREEDYDATA:Result} %{IPV4:client_ip}:%{INT:client_port} %{DATA:msg_type} %{DATA:Result_content} to %{IPV4:src_ip}:%{INT:src_port} Sent to %{IPV4:dist_ip}:%{INT:dist_port}",
    				  "F5_msg1","%{DATA:msg_ID0}:%{DATA:msg_ID1}: HA vlan_fs %{DATA:HA_vlan_fs} is( now)? %{DATA:Result}.",
    				  "F5_msg1","%{DATA:msg_ID0}:%{DATA:msg_ID1}: HA vlan_fs %{DATA:HA_vlan_fs} %{DATA:Result_reason} action is %{DATA:Result}.",	
    				  "F5_msg1","%{DATA:msg_ID0}:%{DATA:msg_ID1}: Failover condition reported by %{DATA:reported_by} for traffic group %{DATA:traffic_group}, %{GREEDYDATA:Result_content}.",
    				  "F5_msg1","\(%{USERNAME:user}\) %{DATA:Result} \(mail%{DATA:mail_Result} %{INT:bytes} bytes of output but got status %{DATA:state}\)",				  
    				  "F5_msg1","\(%{USERNAME:user}\) %{DATA:Result} \(%{GREEDYDATA:Result_content}\)",
    				  "F5_msg1","%{DATA:Result_content}\(%{DATA:Result_content1}\):%{GREEDYDATA:Result} for user %{USERNAME:user}( by \(uid=%{DATA:uid}\))*",
    				  "F5_msg1","USER %{USERNAME:user} pid %{DATA:pid} %{DATA:Result} %{GREEDYDATA:Result_content}",
    				  "F5_msg1","%{GREEDYDATA:Result} '' %{GREEDYDATA:Result_reason}",
    				  "F5_msg1","pendsect: %{DATA:pendsect_disk} %{GREEDYDATA:Result}",
    				  "F5_msg1","%{GREEDYDATA:Result} request received, %{GREEDYDATA:Result_content};",
    				  "F5_msg1","%{USER:user} daemon %{GREEDYDATA:Result}",
    				  "F5_msg1","%{DATA:Result} to %{URIHOST:dist_ip}, stratum %{INT:stratum}",
    				  "F5_msg1","%{GREEDYDATA:Result}:%{INT:Result_ID}"				  
    				]				
    		add_field => {
    			"F5_msg" => "%{F5_msg1}"
    			"analyze" => "Y"
                }
    		remove_field => ["F5_msg1"]
    	  }
    	  if[query_uri]{
    		mutate {
    			add_field => {
    				"Result" => "Uri from IP by Query"
    				}
    		}
    	  }
    	  if[response] or [response_uri] {
    		mutate {
    			add_field => {
    				"Result" => "Uri to ip and the response"
    				}
    		}
    	  }	  
    	   if[Result_content]{
    		mutate {
    			add_field => {
    				"Result" => "Failover condition reported"
    				}
    		}
    	  }	  
    	  if[mail_Result] {
    		mutate{
    			add_field => {
    				"Result_content" => "Mailed output but got status"
    				}
    			remove_field => ["mail_Result"]
    		}
    	  }	  
    	  if[response] {
    		mutate{
    			add_field => {
    				"Result" => "Response"
    				}
    		}
    	  }
    	   if[reset_s] {
    		mutate{
    			add_field => {
    				"Result" => "Time reset"
    				}
    		}
    	  }
    }

Posted: **Tue May 10, 2016 7:19 am**

64GB is a lot of memory, even for a NLS installation. However, you're in luck. By default, NLS allocates 50% of available system RAM to elasticsearch. You can verify this in /etc/sysconfig/elasticsearch. It MAY be worthwhile to increase the ES_HEAP_SIZE to something bigger than 50% of RAM. You can test this out by changing the last line in this code:

Code: Select all

# Heap Size (defaults to 256m min, 1g max)
# Nagios Log Server Default to 0.5 physical Memory
ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2}') / 2 )m

to

Code: Select all

ES_HEAP_SIZE=$(expr $(free -m|awk '/^Mem:/{print $2 * 2}') / 3 )m

Which would allocate 2/3 of your RAM (instead of 1/2) to the JVM heap after you restarted elasticsearch.

HOWEVER, on a machine with more than 32GB of RAM allocated to JVM heap, you may fall victim to a problem outlined in this article where allocating more than 32GB to the heap can actually cause problems. You may want to experiment to see what gives you the best overall results.

Posted: **Tue May 10, 2016 9:41 am**

Also, what's a free -m command look like?

Posted: **Tue May 10, 2016 9:42 am**

Thanks @eloyd!

Logstash will cache your logs to memory, which is how it's able to search through them so quickly. Take a look at your Backup & Maintenance page and post a screenshot of it for us to look at.

You'll most likely want to adjust the 'close indexes older than' parameter, as that will take them out of memory, but still leave them on the disk to be reopened.

Posted: **Wed May 11, 2016 3:59 am**

Thanks @eloyd, @hsmith, and @rkennedy

I doubt that there is a situation maybe lead to my memory problem. I installed the NLS in the virtual machine of my computer, and the setup of CPU core and memory is 1 core and 2GB. I export this virtual machine to a .ova file, and import it into another vSphere. Change the CPU core and memory to 8 cores and 64GB. Is it possible that this situation is the result? If yes, how do I fix it?

To @eloyd,
In fact, there is 256GB memory we could use, and we used all originally. The exhausted memory problem lead to logstash shut down. I found this https://www.elastic.co/guide/en/elastic ... izing.html and I thought it's the result of the memory problem. So I setup the memory to 64GB, give ES_HEAP_SIZE half of 64GB memory, the problem still exsist.

To @hsmith,
Here is the free memory -h command output,

Affter I shut down the logstash by myself, the output shown as follow,

I think its all logstash's fault but it seem incorrect. It's look like all the memory move to the cache. Should I input the command

echo 1 > /proc/sys/vm/drop_caches

to release the cache memory?

To @rkennedy
Here is the screenshot of the Backup & Maintenance page,

So how do I adjust this page's parameter to solve the memory problem. I really hope the logstah could run steady and never stop by itself.

Thanks you all again!

Posted: **Wed May 11, 2016 9:57 am**

Change your settings away from 2, 0, 0. Try to set them to 2, 15, 30. See if that helps anything out. Next, let's give this command a shot:

Code: Select all

sed -i s/'#LS_HEAP_SIZE="256m"'/'LS_HEAP_SIZE="2048m"'/g /etc/sysconfig/logstash

service logstash restart

Give this a shot and let me know what happens.

Thanks!

Posted: **Wed May 11, 2016 10:09 am**

Sorry, I read this originally as elasticsearch was stopping, not logstash. Yes, you will want to increase LS_HEAP_SIZE to something more reasonable, given your utilization. And you will definitely want to do what @hsmith said about 2 15 30 for the index management.

Also, for fun, can you confirm that you're running the proper version of logstash:

Code: Select all

/usr/local/nagioslogserver/logstash/bin/logstash --version

(Note, this can take a while, since it needs to fire up a full JVM to process.)

I'm expecting 1.5.1

Posted: **Wed May 11, 2016 4:14 pm**

Thanks Eric

(Again)

Posted: **Wed May 11, 2016 10:34 pm**

Thanks @hsmith and @eloyd,

I already changed my Backup & Maintenance page settings to 2, 15, 30, and set up LS_HEAP_SIZE to 2048m and restart logstash.

In fact, I updated my NLS version from 1.4.0 to 1.4.1 and rebooted my server yesterday, the memory cache number was about 1GB. Now, when I login to the server, the output of free -h commnd is

The cache memory add to 15GB, I afraid of maybe tomorrow the logstash will be shut down again because of insufficient memory:?

To @eloyd, here is my logstash version

It's 1.5.1 you expecting.

and I found the NLS Administration page has the version information, too. Here is the screenshot

By the way, the data volume is about 18GB one day.

Posted: **Thu May 12, 2016 9:01 am**

About 6 hours later the

free -h

command output is shown below, I guess tomorrow the logstash will be shut down by itself again

Nagios Support Forum

LogServer's memory be exhausted

LogServer's memory be exhausted

Re: LogServer's memory be exhausted

Re: LogServer's memory be exhausted

Re: LogServer's memory be exhausted

Re: LogServer's memory be exhausted

Re: LogServer's memory be exhausted

Re: LogServer's memory be exhausted

Re: LogServer's memory be exhausted

Re: LogServer's memory be exhausted

Re: LogServer's memory be exhausted