Hi,
With the recent Intel hardware security issues, we did the required patches. However, as it was expected, we faced very hard performance issues (load spiked to 30%-40%). Maybe you already faced such issues in the last two days and you have any suggestion?
Thanks in advance.
Performance issue after kernel upgrade
-
kyang
Re: Performance issue after kernel upgrade
I have seen the news about the Intel Security updates. I'm not entirely sure, but it's certainly possible Intel will be looking into this if everyone is having performance issues.
For this matter, are you still seeing this performance spikes now? Are you running a VM from Windows or a standalone server?
Where is the load coming from, a specific process? What's the output of top? How about any notable logs or is this mainly an "Intel update" thing?
For this matter, are you still seeing this performance spikes now? Are you running a VM from Windows or a standalone server?
Where is the load coming from, a specific process? What's the output of top? How about any notable logs or is this mainly an "Intel update" thing?
Re: Performance issue after kernel upgrade
Hi,
from what i know Linux is implementing KPTI (Kernel Page Table Isolation) to mitigate the Meltdown variant of the exploit.
This will in general don't cause significant performance degradation if the processes on the system mostly stay in userspace.
Due to how Nagios operates with a lot of processes, scripts, runtimes being spawned to execute checks i suspect there are
a lot of syscalls happening.
I will quote wikipedia here:
KPTI fixes these leaks by separating user-space and kernel-space page tables entirely. On processors that support the process-context identifiers (PCID), a translation lookaside buffer (TLB) flush can be avoided,[4] but even then it comes at a significant performance cost, particularly in syscall-heavy and interrupt-heavy workloads
This is the reason i haven't upgrades the OS in our environment. We can't afford to slow down our Nagios System.
Kind regards
from what i know Linux is implementing KPTI (Kernel Page Table Isolation) to mitigate the Meltdown variant of the exploit.
This will in general don't cause significant performance degradation if the processes on the system mostly stay in userspace.
Due to how Nagios operates with a lot of processes, scripts, runtimes being spawned to execute checks i suspect there are
a lot of syscalls happening.
I will quote wikipedia here:
KPTI fixes these leaks by separating user-space and kernel-space page tables entirely. On processors that support the process-context identifiers (PCID), a translation lookaside buffer (TLB) flush can be avoided,[4] but even then it comes at a significant performance cost, particularly in syscall-heavy and interrupt-heavy workloads
This is the reason i haven't upgrades the OS in our environment. We can't afford to slow down our Nagios System.
Kind regards
Re: Performance issue after kernel upgrade
reincarne, Can you tell us how many Hosts and Services you are monitoring? And perhaps some hardware info regarding your Nagios server?
I'm just a fellow admin. I patched a 5.4.11 XI Linux server today running a very light load of about 200 host and services combined. (Centos 7 VM, 8 Intel Xeon cores, 8gb RAM). This particular server hasn't shown any hint of slowness. It would be helpful to hear what load and hardware specs might see some.
Thanks!
I'm just a fellow admin. I patched a 5.4.11 XI Linux server today running a very light load of about 200 host and services combined. (Centos 7 VM, 8 Intel Xeon cores, 8gb RAM). This particular server hasn't shown any hint of slowness. It would be helpful to hear what load and hardware specs might see some.
Thanks!
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Performance issue after kernel upgrade
Obviously we're duck-taping a hole in the boat, but taking a look at https://assets.nagios.com/downloads/nag ... ios-XI.pdf is better than nothing until Intel gets their chip together.
Re: Performance issue after kernel upgrade
Hi,
For those who are using AWS service - we solved it a day after by creating HVM machine and we are stable since then.
About number of hosts - 1700
Services - 30000
For those who are using AWS service - we solved it a day after by creating HVM machine and we are stable since then.
About number of hosts - 1700
Services - 30000
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: Performance issue after kernel upgrade
@reincarne, as OP, do you think this is ready to lock up? If not, what other questions do you have?