Page 2 of 3

Re: Slow web interface

Posted: Wed Oct 11, 2017 1:49 pm
by CameronWP
Sorry, the logstash log is very large and even zipped your site won't take it. I am just working to get it to you.

EDIT - I have attached a truncated log to this message.

Re: Slow web interface

Posted: Wed Oct 11, 2017 2:05 pm
by mcapra
Oof, here's a long post.

Two things I see:

First, there is some junk data trying to make it's way into an Index. Specifically some record(s) with the TargetFileName field:

Code: Select all

[2017-10-11 03:24:49,511][DEBUG][action.bulk              ] [330efcd2-34fc-4f7f-9cba-df89a1374eee] [logstash-2017.10.11][1] failed to execute bulk item (index) index {[logstash-2017.10.11][mcafee][AV8KUlbhOgRVuukGN1iA], source[{"message":"<?xml version=\"1.0\" encoding=\"UTF-8\"?><EPOEvent><MachineInfo><MachineName>D47661JC4</MachineName><AgentGUID>{de1c5a4c-7c0e-11e6-2d91-3cd92b5c8d2f}</AgentGUID><IPAddress>10.1.6.10</IPAddress><OSName>Windows 7</OSName><UserName>SYSTEM</UserName><TimeZoneBias>240</TimeZoneBias><RawMACAddress>3cd92b5c8d2f</RawMACAddress></MachineInfo><SoftwareInfo ProductName=\"ePO Deep Command\" ProductVersion=\"2.4.1.465\" ProductFamily=\"Secure\"><Event><EventID>34362</EventID><Severity>2</Severity><GMTTime>2017-10-11T07:22:05</GMTTime><CommonFields><Analyzer>AMTMGMT_1000</Analyzer><AnalyzerDetectionMethod>1</AnalyzerDetectionMethod><AnalyzerName>ePO Deep Command</AnalyzerName><AnalyzerVersion>2.4.1.465</AnalyzerVersion><TargetFileName>System is not capable of HBC, ignoring the enforcement and sending an event</TargetFileName><ThreatCategory>ops</ThreatCategory><ThreatName>Configure failure</ThreatName><ThreatType>AMT</ThreatType></CommonFields></Event></SoftwareInfo></EPOEvent>\r","@version":"1","@timestamp":"2017-10-11T07:22:38.018Z","host":"10.10.41.47","type":"mcafee","parsed":{"MachineInfo":[{"MachineName":["D47661JC4"],"AgentGUID":["{de1c5a4c-7c0e-11e6-2d91-3cd92b5c8d2f}"],"IPAddress":["10.1.6.10"],"OSName":["Windows 7"],"UserName":["SYSTEM"],"TimeZoneBias":["240"],"RawMACAddress":["3cd92b5c8d2f"]}],"SoftwareInfo":[{"ProductName":"ePO Deep Command","ProductVersion":"2.4.1.465","ProductFamily":"Secure","Event":[{"EventID":["34362"],"Severity":["2"],"GMTTime":["2017-10-11T07:22:05"],"CommonFields":[{"Analyzer":["AMTMGMT_1000"],"AnalyzerDetectionMethod":["1"],"AnalyzerName":["ePO Deep Command"],"AnalyzerVersion":["2.4.1.465"],"TargetFileName":["System is not capable of HBC, ignoring the enforcement and sending an event"],"ThreatCategory":["ops"],"ThreatName":["Configure failure"],"ThreatType":["AMT"]}]}]}]}}]}
org.elasticsearch.index.mapper.MapperParsingException: object mapping [TargetFileName] trying to serialize a value with no field associated with it, current value [System is not capable of HBC, ignoring the enforcement and sending an event]
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:702)
	at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeNonDynamicArray(ObjectMapper.java:695)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:604)
	at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:489)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:554)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeNonDynamicArray(ObjectMapper.java:685)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:604)
	at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:489)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:554)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeNonDynamicArray(ObjectMapper.java:685)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:604)
	at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:489)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:554)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeNonDynamicArray(ObjectMapper.java:685)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:604)
	at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:489)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:554)
	at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:487)
	at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
	at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
	at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:466)
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:418)
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:148)
	at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
	at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Second, there's some very aggressive garbage collection happening within ElasticSearch (might be a direct result of the junk data):

Code: Select all

[2017-10-11 03:29:09,929][WARN ][monitor.jvm              ] [330efcd2-34fc-4f7f-9cba-df89a1374eee] [gc][old][395927][16495] duration [1m], collections [1]/[1m], total [1m]/[1.3d], memory [31.7gb]->[30.3gb]/[31.8gb], all_pools {[young] [1.4gb]->[258.4mb]/[1.4gb]}{[survivor] [162.2mb]->[0b]/[191.3mb]}{[old] [30.1gb]->[30.1gb]/[30.1gb]}
[2017-10-11 03:30:12,295][WARN ][monitor.jvm              ] [330efcd2-34fc-4f7f-9cba-df89a1374eee] [gc][old][395935][16496] duration [54.9s], collections [1]/[55.3s], total [54.9s]/[1.3d], memory [31.7gb]->[30.4gb]/[31.8gb], all_pools {[young] [1.4gb]->[339.8mb]/[1.4gb]}{[survivor] [140.3mb]->[0b]/[191.3mb]}{[old] [30.1gb]->[30.1gb]/[30.1gb]}
[2017-10-11 03:31:07,238][WARN ][monitor.jvm              ] [330efcd2-34fc-4f7f-9cba-df89a1374eee] [gc][old][395938][16497] duration [52.1s], collections [1]/[52.2s], total [52.1s]/[1.3d], memory [31.7gb]->[30.4gb]/[31.8gb], all_pools {[young] [1.4gb]->[320.2mb]/[1.4gb]}{[survivor] [174.8mb]->[0b]/[191.3mb]}{[old] [30.1gb]->[30.1gb]/[30.1gb]}
[2017-10-11 03:32:05,556][WARN ][monitor.jvm              ] [330efcd2-34fc-4f7f-9cba-df89a1374eee] [gc][old][395941][16498] duration [55.7s], collections [1]/[56.2s], total [55.7s]/[1.3d], memory [31.7gb]->[30.4gb]/[31.8gb], all_pools {[young] [1.4gb]->[351.9mb]/[1.4gb]}{[survivor] [96.5mb]->[0b]/[191.3mb]}{[old] [30.1gb]->[30.1gb]/[30.1gb]}
[2017-10-11 03:33:08,117][WARN ][monitor.jvm              ] [330efcd2-34fc-4f7f-9cba-df89a1374eee] [gc][old][395944][16499] duration [59.8s], collections [1]/[1m], total [59.8s]/[1.3d], memory [31.7gb]->[30.4gb]/[31.8gb], all_pools {[young] [1.4gb]->[322.1mb]/[1.4gb]}{[survivor] [114mb]->[0b]/[191.3mb]}{[old] [30.1gb]->[30.1gb]/[30.1gb]}
It seems as though your McAfee logs are sending some data that ElasticSearch isn't able to gracefully handle. Here's one of the records that has upset ElasticSearch:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<EPOEvent>
    <MachineInfo>
        <MachineName>D476606CD</MachineName>
        <AgentGUID>{6F1B7A0F-4794-4E3B-883F-342FC4E73ECF}</AgentGUID>
        <IPAddress>10.1.6.11</IPAddress>
        <OSName>Windows 7</OSName>
        <UserName>tthompson</UserName>
        <TimeZoneBias>240</TimeZoneBias>
        <RawMACAddress>6c3be528c344</RawMACAddress>
    </MachineInfo>
    <SoftwareInfo ProductName="ePO Deep Command" ProductVersion="2.4.1.465" ProductFamily="Secure">
        <Event>
            <EventID>34362</EventID>
            <Severity>2</Severity>
            <GMTTime>2017-10-11T07:31:16</GMTTime>
            <CommonFields>
                <Analyzer>AMTMGMT_1000</Analyzer>
                <AnalyzerDetectionMethod>1</AnalyzerDetectionMethod>
                <AnalyzerName>ePO Deep Command</AnalyzerName>
                <AnalyzerVersion>2.4.1.465</AnalyzerVersion>
                <TargetFileName>System is not capable of HBC, ignoring the enforcement and sending an event</TargetFileName>
                <ThreatCategory>ops</ThreatCategory>
                <ThreatName>Configure failure</ThreatName>
                <ThreatType>AMT</ThreatType>
            </CommonFields>
        </Event>
    </SoftwareInfo>
</EPOEvent>
ThreatName also seems to cause problems later on in the log. I don't see any obvious patterns when cross-referencing the two fields.

A few questions:

Is there a particular reason for the mutate step in this filter? Just curious, I don't think it's related to your problem at all:

Code: Select all

if [type] == 'mcafee' {
        mutate {
            gsub => [
                'message', '^<.*]\s', ''
            ]
        }
        xml {
            source => 'message'
            target => 'parsed'
        }
    }
Can you also share the output of these commands executed from the CLI of one of your Nagios Log Server machines:

Code: Select all

curl -XGET 'http://localhost:9200/logstash-2017.10.11/_mapping/mcafee'
curl -XGET 'http://localhost:9200/logstash-2017.10.11/_mapping'
I suspect there's issues with how one or more of those fields are mapped.

Re: Slow web interface

Posted: Wed Oct 11, 2017 2:18 pm
by CameronWP
Thanks mcapra! The mutate in that filter was my attempt at getting something usable from what was being ingested from the EPO server. I am pretty new to these technologies so it makes sense if I used the wrong syntax.

Re: Slow web interface

Posted: Wed Oct 11, 2017 2:30 pm
by mcapra
The following fields are typed as object:
  • AnalyzerDATVersion
  • AnalyzerEngineVersion
  • TargetFileName
  • ThreatName
Which I believe is the root cause of this mess; ElasticSearch is looking for an object and being provided with a string. I'm not 100% sure what the behavior *should* be in this situation and I won't have access to my lab environment for a few hours.

Everything that is typed as object seems to be bottom-level values in the XML structure, so I have no earthly idea why ElasticSearch would dynamically type them as object. If there's raw McAfee logs available, they might have some clues.

Re: Slow web interface

Posted: Wed Oct 11, 2017 2:38 pm
by CameronWP
I might turn off Mcafee for now and try again with it another time. Thanks!

Re: Slow web interface

Posted: Wed Oct 11, 2017 2:59 pm
by kyang
Thanks @mcapra!

@CameronWP Let us know if you see any improvements once Mcafee is turned off.

Re: Slow web interface

Posted: Wed Nov 01, 2017 12:55 pm
by CameronWP
Hello:

I have let it run a few days and now it seems like it is behaving much worse. I can't seem to get this product to run more than a few days at a time before running into some issue that causes it to stop working in some fashion.

I have attached new logs from logstash and elasticsearch.

Re: Slow web interface

Posted: Wed Nov 01, 2017 3:02 pm
by npolovenko
@CameronWP , I looked through your ps.txt log and it looks like you're running Nagios XI and Nagios Log Server on one server? That is causing issues for sure. We do not support installations of multiple Nagios products on one server.
Is there a possibility to migrate Nagios XI onto another server?

Re: Slow web interface

Posted: Thu Nov 02, 2017 9:24 am
by mcapra
Something is depriving ElasticSearch of memory which is causing all sorts of problems upstream. The root cause is likely related to @npolovenko's findings.

Re: Slow web interface

Posted: Thu Nov 02, 2017 9:32 am
by CameronWP
Ok, thanks for that. I am working on moving Nagios Monitoring off to it's own server.

I appreciate the replies!