Re: upgrade 2.1.4 => 2.1.6 performance issues
Posted: Fri May 22, 2020 9:48 am
Ok, I followed the steps as required. I saved all the output into a logfile, but it's a bit lengthy.
In short the result is as following: No change, except for the following.
- the messages in the elasticsearch logfile now complain about the 2000 queue limit being reached.
- The load of the nodes are even higher (which makes kind of sense).
Due to the high load, I've ruled out the following:
- CPU is high, but not 100% (aprox. 75%), half of it is wait I/O.
- The disks are reading at > 250MB/s (streaming?), but not at 100%. I'm a bit confused since I'm querying data from the past hour. Shouldn't that be in memory?
- Network traffic is just a couple of MB/s, so that's not it.
- Memory....I've read that NLS should assign 50% to the java proces as defined in /etc/sysconfig/elasticsearch. We have 128GB in each node and the calculation in de startup file will give 64GB as an answer:
I know (by now) that 64GB is more than the advised 32GB (learning on the fly), however when I look at the java proces with nmon (or ps aux) it shows that it takes up the full 128GB.
I don't know a lot about java, but I do know that java GC etc. is something you do not want. It would explain why the load is high, the disks are very busy and the system has even a problem with something as simple generating the Home Page (takes minutes and also fills up the search queue).
I would like to start the nodes fixing them at either 32GB mem as adviced, or 64GB as "has always worked" to see if my theory makes sence. Soooooo.....can you provide me with the correct syntax?
As for the output of the above provided steps, if you want hem, let me know. I just need to make them more readable, I need a little bit extra time for that.
Grreting...Hans
In short the result is as following: No change, except for the following.
- the messages in the elasticsearch logfile now complain about the 2000 queue limit being reached.
- The load of the nodes are even higher (which makes kind of sense).
Due to the high load, I've ruled out the following:
- CPU is high, but not 100% (aprox. 75%), half of it is wait I/O.
- The disks are reading at > 250MB/s (streaming?), but not at 100%. I'm a bit confused since I'm querying data from the past hour. Shouldn't that be in memory?
- Network traffic is just a couple of MB/s, so that's not it.
- Memory....I've read that NLS should assign 50% to the java proces as defined in /etc/sysconfig/elasticsearch. We have 128GB in each node and the calculation in de startup file will give 64GB as an answer:
Code: Select all
colog3:root:/etc/sysconfig> $(expr $(free -m|awk '/^Mem:/{print $2}') / 2 )
-bash: 64339: command not foundCode: Select all
+nmon-14g---------------------Hostname=colog3-------Refresh= 1secs ---15:06.54----------------------------------------------------------------------------+
| CPU Utilisation ------------------------------------------------------------------------------------------------------------------------------------- |
|---------------------------+-------------------------------------------------+ |
|CPU User% Sys% Wait% Idle|0 |25 |50 |75 100| |
| 1 83.2 0.0 0.0 16.8|UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU > |
| 2 4.0 1.0 0.0 95.0|U > | |
| 3 35.3 0.0 0.0 64.7|UUUUUUUUUUUUUUUUU > |
| 4 4.0 1.0 0.0 95.0|U > |
| 5 1.0 0.0 0.0 99.0| > | |
| 6 5.9 1.0 0.0 93.1|UU > |
| 7 1.0 0.0 0.0 99.0| > | |
| 8 0.0 0.0 0.0 100.0| > |
| 9 0.0 0.0 0.0 100.0| > | |
| 10 1.0 0.0 0.0 99.0| > | |
| 11 1.0 0.0 0.0 99.0| > | |
| 12 3.0 0.0 0.0 97.0|U > | |
| 13 1.0 0.0 0.0 99.0| > | |
| 14 0.0 0.0 0.0 100.0| > |
| 15 1.0 0.0 0.0 99.0| > | |
| 16 5.0 0.0 0.0 95.0|UU > |
|---------------------------+-------------------------------------------------+ |
|Avg 9.2 0.2 0.0 90.6|UUUU > | |
|---------------------------+-------------------------------------------------+ |
| Top Processes Procs=324 mode=4 (1=Basic, 3=Perf 4=Size 5=I/O)-------------------------------------------------------------------------------------------|
| PID %CPU ResSize Command |
| Used KB |
| 20831 115.5 124512128 /bin/java -Xms64339m -Xmx64339m -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFrac|
|b 1634 39.5 778176 /bin/java -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSIniti|
|/ 666 0.0 131776 /usr/lib/systemd/systemd-journald cyFraction|
|n 1506 0.0 75612 /usr/sbin/rsyslogd -n ome=/usr/l|
|/ 19114 0.0 21776 /usr/bin/python -tt /usr/sbin/yum-cron /etc/yum/yum-cron-hourly.conf r/logstash|
| 20368 0.0 14780 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs |
| 20367 0.0 14776 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php poller |
| 20369 0.0 14616 /usr/bin/php -q /var/www/html/nagioslogserver/www/index.php jobs/apache |
| 8385 0.0 13624 /usr/sbin/httpd -DFOREGROUND |
| 4315 0.0 13528 /usr/sbin/httpd -DFOREGROUND |
| 7344 0.0 13512 /usr/sbin/httpd -DFOREGROUND |
| 8400 0.0 13388 /usr/sbin/httpd -DFOREGROUND |
| 7906 0.0 13284 /usr/sbin/httpd -DFOREGROUND I would like to start the nodes fixing them at either 32GB mem as adviced, or 64GB as "has always worked" to see if my theory makes sence. Soooooo.....can you provide me with the correct syntax?
As for the output of the above provided steps, if you want hem, let me know. I just need to make them more readable, I need a little bit extra time for that.
Grreting...Hans