Page 1 of 1
Upgrade to 2.0.8 - Elasticsearch errors and high CPU
Posted: Wed Sep 11, 2019 4:08 pm
by Envera IT
After upgrading to 2.0.8 from 2.0.2 we started getting this log in the main log file within /var/elasticsearch/
Seems to be related to alerts running and something with the check_interval not parsing correctly. Any idea what could be causing this?
[2019-09-11 03:31:21,675][DEBUG][action.index ] [881dac7b-e349-4fa5-aaef-41294c3b66e0] [nagioslogserver_history][0], node[5KuNYs_iTe2XiiQ3j_egjg], [P], s[STARTED]: Failed to execute [index {[nagioslogserver_history][alert][AW0fO-PCsKxNbujnOADK], source[{"alert_id":"AWi-5NRdRoAfU4O0MhUT","name":"EMAIL-DHCP-SNOOPING-DROP-OFFER","check_interval":"5m","lookback_period":"5m","warning":"1","critical":"1","start":1568186781,"end":1568187081,"query":"{\"query\":{\"filtered\":{\"query\":{\"bool\":{\"should\":[{\"query_string\":{\"query\":\"\\\"Drop offer\\\"\"}}]}},\"filter\":{\"bool\":{\"must\":[{\"range\":{\"@timestamp\":{\"from\":1568186781000,\"to\":1568187081000}}},{\"fquery\":{\"query\":{\"query_string\":{\"query\":\"host:(\\\"10.0.0.10\\\")\"}},\"_cache\":true}}]}}}}}","indexes":"logstash-2019.09.11","ran":1568187081,"status":0,"output":"OK: 0 matching entries found |logs=0;1;1"}]}]
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [check_interval]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:411)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:465)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:201)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NumberFormatException: For input string: "5m"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.longValue(AbstractXContentParser.java:145)
at org.elasticsearch.index.mapper.core.LongFieldMapper.innerParseCreateField(LongFieldMapper.java:288)
at org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:239)
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:401)
... 12 more
Beyond that, CPU load on the cluster has spiked to 6 - 8 on average where before we were sitting around 1.5 - 2. Can you give some advice on what I should be looking for to trace the CPU spike after an upgrade?
Re: Upgrade to 2.0.8 - Elasticsearch errors and high CPU
Posted: Thu Sep 12, 2019 10:30 am
by mbellerue
If you're getting that error regularly in your logs, it may be worth looking at that particular alert. Maybe it didn't get imported properly. It might be worth building it again from the ground up, if it's not too much work.
Beyond that, I would say run top -cn2, copy/paste the output of that, and let's see what the top couple of processes are.
Re: Upgrade to 2.0.8 - Elasticsearch errors and high CPU
Posted: Thu Sep 12, 2019 10:40 am
by Envera IT
mbellerue wrote:If you're getting that error regularly in your logs, it may be worth looking at that particular alert. Maybe it didn't get imported properly. It might be worth building it again from the ground up, if it's not too much work.
Ok I'll give that a shot, it seems to be happening with all the alerts but we only have 16 so its not too bad to rebuild them.
mbellerue wrote:Beyond that, I would say run top -cn2, copy/paste the output of that, and let's see what the top couple of processes are.
Code: Select all
[root@srq-nagios-ls3 ~]# top -cn2
top - 11:38:32 up 13:50, 1 user, load average: 5.34, 5.33, 5.14
Tasks: 112 total, 2 running, 110 sleeping, 0 stopped, 0 zombie
Cpu(s): 56.3%us, 7.3%sy, 29.2%ni, 6.0%id, 0.0%wa, 0.2%hi, 1.0%si, 0.0%st
Mem: 33015156k total, 32786652k used, 228504k free, 149120k buffers
Swap: 262136k total, 0k used, 262136k free, 13852788k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1106 nagios 20 0 51.2g 16g 145m S 121.9 52.3 490:23.17 /usr/bin/java -Xms16120m -Xmx1612
1197 root 39 19 3693m 745m 14m S 60.1 2.3 226:07.85 /usr/bin/java -XX:+UseParNewGC -XX
31616 apache 20 0 237m 11m 3320 S 2.3 0.0 0:28.88 /usr/sbin/httpd
31627 apache 20 0 237m 11m 3336 S 2.3 0.0 0:28.81 /usr/sbin/httpd
10258 root 20 0 98.2m 4888 3836 S 0.3 0.0 0:00.09 sshd: root@pts/0
10276 root 20 0 15024 1340 1008 R 0.3 0.0 0:00.01 top -cn2
1 root 20 0 19360 1392 1100 S 0.0 0.0 0:01.67 /sbin/init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kthreadd]
3 root RT 0 0 0 0 S 0.0 0.0 0:00.30 [migration/0]
4 root 20 0 0 0 0 S 0.0 0.0 0:02.02 [ksoftirqd/0]
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 [migration/0]
6 root RT 0 0 0 0 S 0.0 0.0 0:00.18 [watchdog/0]
7 root RT 0 0 0 0 S 0.0 0.0 0:00.57 [migration/1]
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 [migration/1]
9 root 20 0 0 0 0 S 0.0 0.0 0:01.52 [ksoftirqd/1]
10 root RT 0 0 0 0 S 0.0 0.0 0:00.15 [watchdog/1]
11 root 20 0 0 0 0 S 0.0 0.0 0:03.50 [events/0]
12 root 20 0 0 0 0 S 0.0 0.0 0:25.29 [events/1]
13 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [cgroup]
14 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [khelper]
15 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [netns]
16 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [async/mgr]
17 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [pm]
18 root 20 0 0 0 0 S 0.0 0.0 0:00.24 [sync_supers]
19 root 20 0 0 0 0 S 0.0 0.0 0:00.26 [bdi-default]
20 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kintegrityd/0]
21 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kintegrityd/1]
22 root 20 0 0 0 0 S 0.0 0.0 0:01.96 [kblockd/0]
23 root 20 0 0 0 0 S 0.0 0.0 0:02.17 [kblockd/1]
24 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kacpid]
25 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kacpi_notify]
26 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kacpi_hotplug]
27 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [ata/0]
28 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [ata/1]
29 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [ata_aux]
30 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [ksuspend_usbd]
31 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [khubd]
32 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kseriod]
33 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [md/0]
34 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [md/1]
35 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [md_misc/0]
36 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [md_misc/1]
37 root 20 0 0 0 0 S 0.0 0.0 0:00.03 [khungtaskd]
38 root 20 0 0 0 0 S 0.0 0.0 0:04.94 [kswapd0]
39 root 25 5 0 0 0 S 0.0 0.0 0:00.00 [ksmd]
40 root 39 19 0 0 0 S 0.0 0.0 0:00.38 [khugepaged]
41 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [aio/0]
42 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [aio/1]
43 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [crypto/0]
44 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [crypto/1]
49 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kthrotld/0]
50 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kthrotld/1]
51 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [pciehpd]
53 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kpsmoused]
54 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [usbhid_resumer]
206 root 20 0 0 0 0 S 0.0 0.0 0:00.01 [scsi_eh_0]
216 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [scsi_eh_1]
228 root 20 0 0 0 0 S 0.0 0.0 0:01.40 [mpt_poll_0]
229 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [mpt/0]
230 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [scsi_eh_2]
317 root 20 0 0 0 0 S 0.0 0.0 0:24.10 [jbd2/sda1-8]
318 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [ext4-dio-unwrit]
319 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [ext4-dio-unwrit]
397 root 16 -4 11028 1104 340 S 0.0 0.0 0:00.04 /sbin/udevd -d
483 root 20 0 0 0 0 S 0.0 0.0 0:30.68 [flush-8:0]
546 root 20 0 0 0 0 S 0.0 0.0 0:01.00 [vmmemctl]
709 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kstriped]
756 root 20 0 0 0 0 S 0.0 0.0 0:01.06 [kauditd]
970 root 16 -4 29768 848 596 S 0.0 0.0 0:02.29 auditd
992 root 20 0 473m 5160 1180 S 0.0 0.0 0:02.18 /sbin/rsyslogd -i /var/run/syslogd
1008 dbus 20 0 21436 812 568 S 0.0 0.0 0:00.00 dbus-daemon --system
1047 root 20 0 196m 7424 2004 S 0.0 0.0 3:09.35 /usr/sbin/snmpd -LS0-6d -Lf /dev/n
1064 root 20 0 66292 1236 500 S 0.0 0.0 0:00.01 /usr/sbin/sshd
1075 ntp 20 0 26520 1868 1284 S 0.0 0.0 0:00.10 ntpd -u ntp:ntp -p /var/run/ntpd.p
1127 root 20 0 84860 2408 696 S 0.0 0.0 0:01.31 sendmail: accepting connections
1136 smmsp 20 0 80448 2128 652 S 0.0 0.0 0:00.00 sendmail: Queue runner@01:00:00 fo
1153 root 20 0 233m 8320 4124 S 0.0 0.0 0:02.42 /usr/sbin/httpd
1175 root 20 0 114m 1236 624 S 0.0 0.0 0:00.68 crond
1191 root 39 19 127m 1424 1128 S 0.0 0.0 0:00.00 runuser -s /bin/sh -c exec /usr/lo
1250 root 20 0 4064 588 500 S 0.0 0.0 0:00.00 /sbin/mingetty /dev/tty1
1252 root 20 0 4064 580 500 S 0.0 0.0 0:00.00 /sbin/mingetty /dev/tty2
1254 root 20 0 4064 580 500 S 0.0 0.0 0:00.00 /sbin/mingetty /dev/tty3
1256 root 20 0 4064 588 500 S 0.0 0.0 0:00.00 /sbin/mingetty /dev/tty4
1258 root 18 -2 11024 1084 324 S 0.0 0.0 0:00.00 /sbin/udevd -d
1259 root 18 -2 11024 1072 312 S 0.0 0.0 0:00.00 /sbin/udevd -d
1260 root 20 0 4064 588 500 S 0.0 0.0 0:00.00 /sbin/mingetty /dev/tty5
1262 root 20 0 4064 588 500 S 0.0 0.0 0:00.00 /sbin/mingetty /dev/tty6
10198 root 20 0 132m 1704 992 S 0.0 0.0 0:00.00 CROND
10199 root 20 0 132m 1704 992 S 0.0 0.0 0:00.00 CROND
10200 nagios 20 0 103m 1292 1124 S 0.0 0.0 0:00.00 /bin/sh -c /usr/bin/php -q /var/ww
10201 nagios 20 0 103m 1296 1124 S 0.0 0.0 0:00.00 /bin/sh -c /usr/bin/php -q /var/ww
10202 nagios 20 0 216m 12m 5776 S 0.0 0.0 0:00.07 /usr/bin/php -q /var/www/html/nagi
10203 nagios 20 0 215m 13m 5828 S 0.0 0.0 0:00.13 /usr/bin/php -q /var/www/html/nagi
10260 root 20 0 105m 1896 1520 S 0.0 0.0 0:00.00 -bash
21374 apache 20 0 239m 13m 3372 S 0.0 0.0 2:17.61 /usr/sbin/httpd
31005 apache 20 0 238m 12m 3320 S 0.0 0.0 0:29.55 /usr/sbin/httpd
31010 apache 20 0 238m 12m 3352 S 0.0 0.0 0:29.45 /usr/sbin/httpd
31012 apache 20 0 238m 12m 3336 S 0.0 0.0 0:30.27 /usr/sbin/httpd
31017 apache 20 0 237m 11m 3320 S 0.0 0.0 0:29.66 /usr/sbin/httpd
31022 apache 20 0 237m 11m 3320 S 0.0 0.0 0:30.02 /usr/sbin/httpd
31110 apache 20 0 238m 12m 3320 S 0.0 0.0 0:29.10 /usr/sbin/httpd
Re: Upgrade to 2.0.8 - Elasticsearch errors and high CPU
Posted: Thu Sep 12, 2019 2:12 pm
by mbellerue
Okay. Your top output looks good, save for the load. So I'll wait to hear back on the alerts.
Re: Upgrade to 2.0.8 - Elasticsearch errors and high CPU
Posted: Fri Sep 13, 2019 12:51 pm
by Envera IT
So I'm still seeing those elastic search logs even after rebuilding the alerts. I tried changing the check interval from 1m to 60s to test and its still failing to parse the "check_interval".
Any ideas on what to check next?
Code: Select all
[2019-09-13 13:46:21,956][DEBUG][action.index ] [ef70bad7-9d2f-44e1-af50-bec5f1c21973] [nagioslogserver_history][4], node[Ab0aTXO3T7iYADGjZKuxXQ], [P], s[STARTED]: Failed to execute [index {[nagioslogserver_history][alert][AW0ru6mD7SIN8iYUYnp9], source[{"alert_id":"AW0rpOUd7SIN8iYUX73j","name":"Fortigate - HQ WAN Recovery","check_interval":"60s","lookback_period":"60s","warning":"1","critical":"1","start":1568396721,"end":1568396781,"query":"{\"query\":{\"filtered\":{\"query\":{\"bool\":{\"should\":[{\"query_string\":{\"query\":\"\\\"Link Monitor changes state from failed to ok\\\" NOT interface:\\\"coco\\\"\"}}]}},\"filter\":{\"bool\":{\"must\":[{\"range\":{\"@timestamp\":{\"from\":1568396721000,\"to\":1568396781000}}}]}}}}}","indexes":"logstash-2019.09.13","ran":1568396781,"status":0,"output":"OK: 0 matching entries found |logs=0;1;1"}]}]
org.elasticsearch.index.mapper.MapperParsingException: failed to parse [check_interval]
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:411)
at org.elasticsearch.index.mapper.object.ObjectMapper.serializeValue(ObjectMapper.java:706)
at org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:497)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:544)
at org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:493)
at org.elasticsearch.index.shard.IndexShard.prepareCreate(IndexShard.java:465)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:201)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.performOnPrimary(TransportShardReplicationOperationAction.java:574)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase$1.doRun(TransportShardReplicationOperationAction.java:440)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NumberFormatException: For input string: "60s"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.elasticsearch.common.xcontent.support.AbstractXContentParser.longValue(AbstractXContentParser.java:145)
at org.elasticsearch.index.mapper.core.LongFieldMapper.innerParseCreateField(LongFieldMapper.java:288)
at org.elasticsearch.index.mapper.core.NumberFieldMapper.parseCreateField(NumberFieldMapper.java:239)
at org.elasticsearch.index.mapper.core.AbstractFieldMapper.parse(AbstractFieldMapper.java:401)
... 12 more
Re: Upgrade to 2.0.8 - Elasticsearch errors and high CPU
Posted: Fri Sep 13, 2019 3:51 pm
by mbellerue
What are the stats on the system in question? Also, is it a physical machine, or a virtual machine?
Would it be possible to destroy the alerts one by one to see if the load comes down? Thinking if we can narrow the performance issue down to a specific alert, we'll know what to look for. Or if the load continues to stay high, even with no alerts, we'll know that the alerts are just a red herring.
Also, if you could PM me a system profile, I'll take a look at it on this side.
Re: Upgrade to 2.0.8 - Elasticsearch errors and high CPU
Posted: Sat Sep 28, 2019 12:13 pm
by Envera IT
Sorry for the delay. So alerts are working and the CPU load seems to be variable with the time of day (amount of logs). We're planning on upgrading the resources on the cluster in the next few weeks and will see if that has an impact. You can close this out and if we continue to have issues I'll just open up another one. Thanks
Re: Upgrade to 2.0.8 - Elasticsearch errors and high CPU
Posted: Mon Sep 30, 2019 6:45 am
by scottwilkerson
Ehamby wrote:Sorry for the delay. So alerts are working and the CPU load seems to be variable with the time of day (amount of logs). We're planning on upgrading the resources on the cluster in the next few weeks and will see if that has an impact. You can close this out and if we continue to have issues I'll just open up another one. Thanks
Great. Locking