After upgrading to 2.0.8 from 2.0.2 we started getting this log in the main log file within /var/elasticsearch/
Seems to be related to alerts running and something with the check_interval not parsing correctly. Any idea what could be causing this?
[2019-09-11 03:31:21,675][DEBUG][action.index ] [881dac7b-e349-4fa5-aaef-41294c3b66e0] [nagioslogserver_history][0], node[5KuNYs_iTe2XiiQ3j_egjg], [P], s[STARTED]: Failed to execute [index {[nagioslogserver_history][alert][AW0fO-PCsKxNbujnOADK], source[{"alert_id":"AWi-5NRdRoAfU4O0MhUT","name":"EMAIL-DHCP-SNOOPING-DROP-OFFER","check_interval":"5m","lookback_period":"5m","warning":"1","critical":"1","start":1568186781,"end":1568187081,"query":"{\"query\":{\"filtered\":{\"query\":{\"bool\":{\"should\":[{\"query_string\":{\"query\":\"\\\"Drop offer\\\"\"}}]}},\"filter\":{\"bool\":{\"must\":[{\"range\":{\"@timestamp\":{\"from\":1568186781000,\"to\":1568187081000}}},{\"fquery\":{\"query\":{\"query_string\":{\"query\":\"host:(\\\"10.0.0.10\\\")\"}},\"_cache\":true}}]}}}}}","indexes":"logstash-2019.09.11","ran":1568187081,"status":0,"output":"OK: 0 matching entries found |logs=0;1;1"}]}]
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [check_interval]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:411)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:465)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:201)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NumberFormatException: For input string: "5m"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.longValue(AbstractXContentParser.java:145)
at org.elasticsearch.index.mapper.core.LongFieldMapper.innerParseCreateField(LongFieldMapper.java:288)
at org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:239)
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:401)
... 12 more
Beyond that, CPU load on the cluster has spiked to 6 - 8 on average where before we were sitting around 1.5 - 2. Can you give some advice on what I should be looking for to trace the CPU spike after an upgrade?
If you're getting that error regularly in your logs, it may be worth looking at that particular alert. Maybe it didn't get imported properly. It might be worth building it again from the ground up, if it's not too much work.
Beyond that, I would say run top -cn2, copy/paste the output of that, and let's see what the top couple of processes are.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
mbellerue wrote:If you're getting that error regularly in your logs, it may be worth looking at that particular alert. Maybe it didn't get imported properly. It might be worth building it again from the ground up, if it's not too much work.
Ok I'll give that a shot, it seems to be happening with all the alerts but we only have 16 so its not too bad to rebuild them.
mbellerue wrote:Beyond that, I would say run top -cn2, copy/paste the output of that, and let's see what the top couple of processes are.
So I'm still seeing those elastic search logs even after rebuilding the alerts. I tried changing the check interval from 1m to 60s to test and its still failing to parse the "check_interval".
[2019-09-13 13:46:21,956][DEBUG][action.index ] [ef70bad7-9d2f-44e1-af50-bec5f1c21973] [nagioslogserver_history][4], node[Ab0aTXO3T7iYADGjZKuxXQ], [P], s[STARTED]: Failed to execute [index {[nagioslogserver_history][alert][AW0ru6mD7SIN8iYUYnp9], source[{"alert_id":"AW0rpOUd7SIN8iYUX73j","name":"Fortigate - HQ WAN Recovery","check_interval":"60s","lookback_period":"60s","warning":"1","critical":"1","start":1568396721,"end":1568396781,"query":"{\"query\":{\"filtered\":{\"query\":{\"bool\":{\"should\":[{\"query_string\":{\"query\":\"\\\"Link Monitor changes state from failed to ok\\\" NOT interface:\\\"coco\\\"\"}}]}},\"filter\":{\"bool\":{\"must\":[{\"range\":{\"@timestamp\":{\"from\":1568396721000,\"to\":1568396781000}}}]}}}}}","indexes":"logstash-2019.09.13","ran":1568396781,"status":0,"output":"OK: 0 matching entries found |logs=0;1;1"}]}]
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [check_interval]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:411)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:465)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:201)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NumberFormatException: For input string: "60s"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.longValue(AbstractXContentParser.java:145)
at org.elasticsearch.index.mapper.core.LongFieldMapper.innerParseCreateField(LongFieldMapper.java:288)
at org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:239)
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:401)
... 12 more
What are the stats on the system in question? Also, is it a physical machine, or a virtual machine?
Would it be possible to destroy the alerts one by one to see if the load comes down? Thinking if we can narrow the performance issue down to a specific alert, we'll know what to look for. Or if the load continues to stay high, even with no alerts, we'll know that the alerts are just a red herring.
Also, if you could PM me a system profile, I'll take a look at it on this side.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Sorry for the delay. So alerts are working and the CPU load seems to be variable with the time of day (amount of logs). We're planning on upgrading the resources on the cluster in the next few weeks and will see if that has an impact. You can close this out and if we continue to have issues I'll just open up another one. Thanks
Ehamby wrote:Sorry for the delay. So alerts are working and the CPU load seems to be variable with the time of day (amount of logs). We're planning on upgrading the resources on the cluster in the next few weeks and will see if that has an impact. You can close this out and if we continue to have issues I'll just open up another one. Thanks