Slow web interface

This support forum board is for support questions relating to Nagios Log Server, our solution for managing and monitoring critical log data.
CameronWP
Posts: 134
Joined: Fri Apr 17, 2015 2:17 pm

Re: Slow web interface

Post by CameronWP »

Sorry, the logstash log is very large and even zipped your site won't take it. I am just working to get it to you.

EDIT - I have attached a truncated log to this message.
You do not have the required permissions to view the files attached to this post.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Slow web interface

Post by mcapra »

Oof, here's a long post.

Two things I see:

First, there is some junk data trying to make it's way into an Index. Specifically some record(s) with the TargetFileName field:

Code: Select all

[2017-10-11 03:24:49,511][DEBUG][action.bulk              ] [330efcd2-34fc-4f7f-9cba-df89a1374eee] [logstash-2017.10.11][1] failed to execute bulk item (index) index {[logstash-2017.10.11][mcafee][AV8KUlbhOgRVuukGN1iA], source[{"message":"<?xml version=\"1.0\" encoding=\"UTF-8\"?><EPOEvent><MachineInfo><MachineName>D47661JC4</MachineName><AgentGUID>{de1c5a4c-7c0e-11e6-2d91-3cd92b5c8d2f}</AgentGUID><IPAddress>10.1.6.10</IPAddress><OSName>Windows 7</OSName><UserName>SYSTEM</UserName><TimeZoneBias>240</TimeZoneBias><RawMACAddress>3cd92b5c8d2f</RawMACAddress></MachineInfo><SoftwareInfo ProductName=\"ePO Deep Command\" ProductVersion=\"2.4.1.465\" ProductFamily=\"Secure\"><Event><EventID>34362</EventID><Severity>2</Severity><GMTTime>2017-10-11T07:22:05</GMTTime><CommonFields><Analyzer>AMTMGMT_1000</Analyzer><AnalyzerDetectionMethod>1</AnalyzerDetectionMethod><AnalyzerName>ePO Deep Command</AnalyzerName><AnalyzerVersion>2.4.1.465</AnalyzerVersion><TargetFileName>System is not capable of HBC, ignoring the enforcement and sending an event</TargetFileName><ThreatCategory>ops</ThreatCategory><ThreatName>Configure failure</ThreatName><ThreatType>AMT</ThreatType></CommonFields></Event></SoftwareInfo></EPOEvent>\r","@version":"1","@timestamp":"2017-10-11T07:22:38.018Z","host":"10.10.41.47","type":"mcafee","parsed":{"MachineInfo":[{"MachineName":["D47661JC4"],"AgentGUID":["{de1c5a4c-7c0e-11e6-2d91-3cd92b5c8d2f}"],"IPAddress":["10.1.6.10"],"OSName":["Windows 7"],"UserName":["SYSTEM"],"TimeZoneBias":["240"],"RawMACAddress":["3cd92b5c8d2f"]}],"SoftwareInfo":[{"ProductName":"ePO Deep Command","ProductVersion":"2.4.1.465","ProductFamily":"Secure","Event":[{"EventID":["34362"],"Severity":["2"],"GMTTime":["2017-10-11T07:22:05"],"CommonFields":[{"Analyzer":["AMTMGMT_1000"],"AnalyzerDetectionMethod":["1"],"AnalyzerName":["ePO Deep Command"],"AnalyzerVersion":["2.4.1.465"],"TargetFileName":["System is not capable of HBC, ignoring the enforcement and sending an event"],"ThreatCategory":["ops"],"ThreatName":["Configure failure"],"ThreatType":["AMT"]}]}]}]}}]}
org.elasticsearch.index.mapper.MapperParsingException: object mapping [TargetFileName] trying to serialize a value with no field associated with it, current value [System is not capable of HBC, ignoring the enforcement and sending an event]
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:702)
	at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeNonDynamicArray(ObjectMapper.java:695)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:604)
	at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:489)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:554)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeNonDynamicArray(ObjectMapper.java:685)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:604)
	at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:489)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:554)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeNonDynamicArray(ObjectMapper.java:685)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:604)
	at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:489)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:554)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeNonDynamicArray(ObjectMapper.java:685)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeArray(ObjectMapper.java:604)
	at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:489)
	at org.elasticsearch.index.mapper.object.ObjectMapper.serializeObject(ObjectMapper.java:554)
	at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:487)
	at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
	at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
	at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:466)
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:418)
	at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:148)
	at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
	at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Second, there's some very aggressive garbage collection happening within ElasticSearch (might be a direct result of the junk data):

Code: Select all

[2017-10-11 03:29:09,929][WARN ][monitor.jvm              ] [330efcd2-34fc-4f7f-9cba-df89a1374eee] [gc][old][395927][16495] duration [1m], collections [1]/[1m], total [1m]/[1.3d], memory [31.7gb]->[30.3gb]/[31.8gb], all_pools {[young] [1.4gb]->[258.4mb]/[1.4gb]}{[survivor] [162.2mb]->[0b]/[191.3mb]}{[old] [30.1gb]->[30.1gb]/[30.1gb]}
[2017-10-11 03:30:12,295][WARN ][monitor.jvm              ] [330efcd2-34fc-4f7f-9cba-df89a1374eee] [gc][old][395935][16496] duration [54.9s], collections [1]/[55.3s], total [54.9s]/[1.3d], memory [31.7gb]->[30.4gb]/[31.8gb], all_pools {[young] [1.4gb]->[339.8mb]/[1.4gb]}{[survivor] [140.3mb]->[0b]/[191.3mb]}{[old] [30.1gb]->[30.1gb]/[30.1gb]}
[2017-10-11 03:31:07,238][WARN ][monitor.jvm              ] [330efcd2-34fc-4f7f-9cba-df89a1374eee] [gc][old][395938][16497] duration [52.1s], collections [1]/[52.2s], total [52.1s]/[1.3d], memory [31.7gb]->[30.4gb]/[31.8gb], all_pools {[young] [1.4gb]->[320.2mb]/[1.4gb]}{[survivor] [174.8mb]->[0b]/[191.3mb]}{[old] [30.1gb]->[30.1gb]/[30.1gb]}
[2017-10-11 03:32:05,556][WARN ][monitor.jvm              ] [330efcd2-34fc-4f7f-9cba-df89a1374eee] [gc][old][395941][16498] duration [55.7s], collections [1]/[56.2s], total [55.7s]/[1.3d], memory [31.7gb]->[30.4gb]/[31.8gb], all_pools {[young] [1.4gb]->[351.9mb]/[1.4gb]}{[survivor] [96.5mb]->[0b]/[191.3mb]}{[old] [30.1gb]->[30.1gb]/[30.1gb]}
[2017-10-11 03:33:08,117][WARN ][monitor.jvm              ] [330efcd2-34fc-4f7f-9cba-df89a1374eee] [gc][old][395944][16499] duration [59.8s], collections [1]/[1m], total [59.8s]/[1.3d], memory [31.7gb]->[30.4gb]/[31.8gb], all_pools {[young] [1.4gb]->[322.1mb]/[1.4gb]}{[survivor] [114mb]->[0b]/[191.3mb]}{[old] [30.1gb]->[30.1gb]/[30.1gb]}
It seems as though your McAfee logs are sending some data that ElasticSearch isn't able to gracefully handle. Here's one of the records that has upset ElasticSearch:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<EPOEvent>
    <MachineInfo>
        <MachineName>D476606CD</MachineName>
        <AgentGUID>{6F1B7A0F-4794-4E3B-883F-342FC4E73ECF}</AgentGUID>
        <IPAddress>10.1.6.11</IPAddress>
        <OSName>Windows 7</OSName>
        <UserName>tthompson</UserName>
        <TimeZoneBias>240</TimeZoneBias>
        <RawMACAddress>6c3be528c344</RawMACAddress>
    </MachineInfo>
    <SoftwareInfo ProductName="ePO Deep Command" ProductVersion="2.4.1.465" ProductFamily="Secure">
        <Event>
            <EventID>34362</EventID>
            <Severity>2</Severity>
            <GMTTime>2017-10-11T07:31:16</GMTTime>
            <CommonFields>
                <Analyzer>AMTMGMT_1000</Analyzer>
                <AnalyzerDetectionMethod>1</AnalyzerDetectionMethod>
                <AnalyzerName>ePO Deep Command</AnalyzerName>
                <AnalyzerVersion>2.4.1.465</AnalyzerVersion>
                <TargetFileName>System is not capable of HBC, ignoring the enforcement and sending an event</TargetFileName>
                <ThreatCategory>ops</ThreatCategory>
                <ThreatName>Configure failure</ThreatName>
                <ThreatType>AMT</ThreatType>
            </CommonFields>
        </Event>
    </SoftwareInfo>
</EPOEvent>
ThreatName also seems to cause problems later on in the log. I don't see any obvious patterns when cross-referencing the two fields.

A few questions:

Is there a particular reason for the mutate step in this filter? Just curious, I don't think it's related to your problem at all:

Code: Select all

if [type] == 'mcafee' {
        mutate {
            gsub => [
                'message', '^<.*]\s', ''
            ]
        }
        xml {
            source => 'message'
            target => 'parsed'
        }
    }
Can you also share the output of these commands executed from the CLI of one of your Nagios Log Server machines:

Code: Select all

curl -XGET 'http://localhost:9200/logstash-2017.10.11/_mapping/mcafee'
curl -XGET 'http://localhost:9200/logstash-2017.10.11/_mapping'
I suspect there's issues with how one or more of those fields are mapped.
Former Nagios employee
https://www.mcapra.com/
CameronWP
Posts: 134
Joined: Fri Apr 17, 2015 2:17 pm

Re: Slow web interface

Post by CameronWP »

Thanks mcapra! The mutate in that filter was my attempt at getting something usable from what was being ingested from the EPO server. I am pretty new to these technologies so it makes sense if I used the wrong syntax.
You do not have the required permissions to view the files attached to this post.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Slow web interface

Post by mcapra »

The following fields are typed as object:
  • AnalyzerDATVersion
  • AnalyzerEngineVersion
  • TargetFileName
  • ThreatName
Which I believe is the root cause of this mess; ElasticSearch is looking for an object and being provided with a string. I'm not 100% sure what the behavior *should* be in this situation and I won't have access to my lab environment for a few hours.

Everything that is typed as object seems to be bottom-level values in the XML structure, so I have no earthly idea why ElasticSearch would dynamically type them as object. If there's raw McAfee logs available, they might have some clues.
Former Nagios employee
https://www.mcapra.com/
CameronWP
Posts: 134
Joined: Fri Apr 17, 2015 2:17 pm

Re: Slow web interface

Post by CameronWP »

I might turn off Mcafee for now and try again with it another time. Thanks!
kyang

Re: Slow web interface

Post by kyang »

Thanks @mcapra!

@CameronWP Let us know if you see any improvements once Mcafee is turned off.
CameronWP
Posts: 134
Joined: Fri Apr 17, 2015 2:17 pm

Re: Slow web interface

Post by CameronWP »

Hello:

I have let it run a few days and now it seems like it is behaving much worse. I can't seem to get this product to run more than a few days at a time before running into some issue that causes it to stop working in some fashion.

I have attached new logs from logstash and elasticsearch.
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: Slow web interface

Post by npolovenko »

@CameronWP , I looked through your ps.txt log and it looks like you're running Nagios XI and Nagios Log Server on one server? That is causing issues for sure. We do not support installations of multiple Nagios products on one server.
Is there a possibility to migrate Nagios XI onto another server?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
mcapra
Posts: 3739
Joined: Thu May 05, 2016 3:54 pm

Re: Slow web interface

Post by mcapra »

Something is depriving ElasticSearch of memory which is causing all sorts of problems upstream. The root cause is likely related to @npolovenko's findings.
Last edited by mcapra on Thu Nov 02, 2017 9:47 am, edited 1 time in total.
Former Nagios employee
https://www.mcapra.com/
CameronWP
Posts: 134
Joined: Fri Apr 17, 2015 2:17 pm

Re: Slow web interface

Post by CameronWP »

Ok, thanks for that. I am working on moving Nagios Monitoring off to it's own server.

I appreciate the replies!
Locked