100% CPU Usage

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
jameyw
Posts: 54
Joined: Fri Mar 17, 2017 10:06 am

100% CPU Usage

Post by jameyw »

Earlier today I did updates to all of my Windows servers. For whatever reason, several of the updates took a long time to complete. During the updates, some of the servers experienced high CPU usage and as a result, the WMI checks being performed by NagiosXI timed out. Apparently, this snowballed into a huge backlog of checks and NagiosXI became completely unresponsive. On the console, I had a number of messages that said "Backlog Limit Exceeded". I finally managed to get enough response on the console to reboot the system but things were still not working. After letting it sit for several hours, the web interface was finally responsive enough to disable some of my service checks so the system could catch up. I still have several hundred checks disabled but everything that is enabled is showing OK status. After an hour of showing OK, my CPU is still pinned at 100%. On the console, TOP shows NAGIOS as the top CPU user... with 4 instances showing 45% CPU usage. I am leery to re-enable all of my checks for fear of making it completely unresponsive again but some of the items being checked are mission-critical.

Should I just let it sit? Is there something I need to do to make it happy again?

Basic config is 134 hosts being checked with 1610 services across those hosts.

Thanks
bolson

Re: 100% CPU Usage

Post by bolson »

Hello jameyw,

This document has several tips on maximizing Nagios performance.

https://assets.nagios.com/downloads/nag ... p#boosting

WMI checks can be very CPU intensive. Are all or most of your checks WMI checks?

Is this a virtual machine? Are you able to add CPU resources? Also, database issues can consume CPU.

Run a top command, look at mysqld, php, nagios, httpd. In addition to CPU, what does your memory utilization look like?
jameyw
Posts: 54
Joined: Fri Mar 17, 2017 10:06 am

Re: 100% CPU Usage

Post by jameyw »

Not all are WMI but I have a fair number. I also have a lot of SNMP checks.

It is a virtual machine. I can add resources if needed. The machine was installed as a pre-configured VM that was downloaded.

VMWare is reporting Memory usage is around 35%-40%. Looking at TOP, MySQL has the highest memory usage but it is only 2.4% and it is using 0.3% of the processor.

See the attached screenshot.
You do not have the required permissions to view the files attached to this post.
bolson

Re: 100% CPU Usage

Post by bolson »

Please send me a copy of your profile, you can download it by going to Admin > System Config > System Profile and click the Download Profile button in the top right corner.

Also, can you tell me how many CPU cores you're assigning to the VM?

I think what might be going on is that you have many check_commands running simultaneously.
jameyw
Posts: 54
Joined: Fri Mar 17, 2017 10:06 am

Re: 100% CPU Usage

Post by jameyw »

It had been running fine until this morning. I'll PM the file to you.

2 CPU with 1 core per socket.
bolson

Re: 100% CPU Usage

Post by bolson »

Just a hunch based on the "Backlog Limit Exceeded" message...
The Windows Updates slowed down your windows machines and caused check commands which would not ordinarily overlap to overlap.

It may be that once this "backlog" is cleared up, things may improve. Also, given the number of hosts and services that you're running, if you have more cores to throw at it that may help. Also, if you're taxing your VMWare host, you may not actually be realizing the compute power you're assigning to this VM.
jameyw
Posts: 54
Joined: Fri Mar 17, 2017 10:06 am

Re: 100% CPU Usage

Post by jameyw »

Added 2 cores and added 1G of memory and everything is closer to normal. Running about 40-50 percent CPU now.

Thanks for the help
bolson

Re: 100% CPU Usage

Post by bolson »

May we close this topic?
Locked