Page 2 of 2
Re: NagiosXI consuming a large amount of CPU
Posted: Wed Jul 22, 2015 11:04 pm
by Box293
jdalrymple wrote:Now - onto what my coworker Box293 would recommend - switch to his check.
https://exchange.nagios.org/directory/P ... re/details
He works on it a lot and it is configured in such a fashion that the checks are performed on a vMA (VMware Management Assistant) instead of directly on your XI box.
Frédéric GRANAT wrote:You said :
At most 30 hosts and about 1000 VMs. That said your environment is likely much larger so therein lies the additional load
My answer : No I said we have 291 hosts (16 ESX servers) and 725 services.
Before I add the 5 new ESX servers, the CPU consumption was fine.
Offloading the ESX checks from Nagios XI will make a difference. This was the reason why I developed the box293_check_vmware plugin.
However, try the following as I am pretty sure it'll make a difference now.
What I am going to suggest is that we modify the command parameter for these check_esx3.pl checks so the plugin runs at a lower priority. The plugin will still operate normally, but it won't hog the CPU when other processes are requesting it.
Configure > Core Configuration Manager
Commands > Commands
Search for esx3
Click the check_esx3_guest
Change it to:
nice -n19 $USER1$/check_esx3.pl -H "$HOSTADDRESS$" -f "$ARG1$" -N "$ARG2$" -l "$ARG3$"$ARG4$
Click Save
Click the check_esx3_host
Change it to:
nice -n19 $USER1$/check_esx3.pl -H "$HOSTADDRESS$" -f "$ARG1$" -l "$ARG2$"$ARG3$
Click Save
Click Apply Configuration.
All I have done is prepended the command with "
nice -n19".
I would observe the system for a while to see if this makes any different to the performance. You can look at the Nagios XI historic performance graphs by looking at the localhost service "Current Load".
Re: NagiosXI consuming a large amount of CPU
Posted: Thu Jul 23, 2015 1:43 am
by Frédéric GRANAT
Hi,
Here it is, but that doesn't help :
The script is monitoring services whereas the overload of cpu comes from monitoring of particular hosts (5 new ESX servers).
Re: NagiosXI consuming a large amount of CPU
Posted: Thu Jul 23, 2015 9:15 am
by jdalrymple
So if you disable those hosts does the CPU usage return to normal?
These hosts have no services whatsoever? If that's the case, what is the check_command defined for these hosts? I've never seen check-host-alive(check_icmp) run away with a CPU.
Also, you do appear to have a bunch of Windows checks running against a 60 second timeout. I wouldn't be surprised if those weren't in some way related to your CPU consumption, but we'll tackle your 5 new ESX hosts first.
Re: NagiosXI consuming a large amount of CPU
Posted: Wed Jul 29, 2015 7:55 am
by Frédéric GRANAT
Hi
I disabled some hosts and earn 20% of cpu by removing the checks on the Windows server (5 servers) hosted by the ESX servers.
No gain by removing the 5 ESX servers
No gain by removing the ILO cards linked to the ESX servers
I don't understand nothing.
How does the CPU consumption of Nagios works ? by steps (above a particular number of hosts/services the consumption increases)?
ps aux --sort -%cpu gives (I already gave you that result) :
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
nagios 6980 33.4 0.0 3076 784 ? R 08:56 119:18 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 6976 31.9 0.0 3076 792 ? S 08:56 113:53 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 6977 31.6 0.0 3076 784 ? R 08:56 112:53 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 6981 31.3 0.0 3076 784 ? S 08:56 111:37 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 6975 30.4 0.0 3076 784 ? R 08:56 108:28 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
nagios 6979 29.5 0.0 3076 792 ? R 08:56 105:06 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagios.qh
Why is there that number of processes ?
What are the checks behind that processes ?
Re: NagiosXI consuming a large amount of CPU
Posted: Wed Jul 29, 2015 1:56 pm
by jdalrymple
Frédéric GRANAT wrote:No gain by removing the 5 ESX servers
I'm thankful for that. If this did actually reduce your load I would have been quite surprised. Obviously we still need to try to explain your load. Again, I look back to that long-running check list for clues. How are you checking those Windows hosts? If I had to guess, I'd say WMI?
Frédéric GRANAT wrote:Why is there that number of processes ?
Every check spawns a new worker process. Contrary to what you may think, this is actually GOOD for performance.
Frédéric GRANAT wrote:What are the checks behind that processes ?
Unfortunately this data isn't easily available, it requires enabling debug logging.
Also this is still puzzling:
Frédéric GRANAT wrote:Before I add the 5 new ESX servers, the CPU consumption was fine.
Absolutely nothing else changed? Are you sure you guys didn't have some other new services staged that just didn't take effect until you did an apply config?
Re: NagiosXI consuming a large amount of CPU
Posted: Fri Jul 31, 2015 2:25 am
by Frédéric GRANAT
Hi,
How are you checking those Windows hosts? If I had to guess, I'd say WMI?
we use a template with : $USER1$/check_icmp -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
Absolutely nothing else changed? Are you sure you guys didn't have some other new services staged that just didn't take effect until you did an apply config?
=> the day the CPU increased, I added 5 ESX server, 5 ILO cards, 5 Windows servers and services. I'm the only administrator.
Re: NagiosXI consuming a large amount of CPU
Posted: Fri Jul 31, 2015 7:49 am
by jdalrymple
Frédéric GRANAT wrote:we use a template with : $USER1$/check_icmp -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 5
I mean the following service checks:
Code: Select all
Host: WS-FS14-SD.cg.ahp Service: CPU Usage Check Time: 61.103
Host: WS-FS14-SD.cg.ahp Service: Services Check Time: 60.791
Host: WS-FS11-SD.cg.ahp Service: Drives : Disk Usage Check Time: 60.645
Host: W2K-WEB09.cg.ahp Service: CPU Usage Check Time: 60.542
Host: WS-FS11-SD.cg.ahp Service: CPU Usage Check Time: 60.486
Host: WS-FS11-SD.cg.ahp Service: Services Check Time: 60.463
Host: W2K-FS05-SD.cg.ahp Service: Services Check Time: 60.458
Host: W2K-FS05-SD.cg.ahp Service: Memory Usage Check Time: 60.422
Host: W2K-FS05-SD.cg.ahp Service: Drives : Disk Usage Check Time: 60.370
Host: W2K-WEB09.cg.ahp Service: Drives : Disk Usage Check Time: 60.338
Host: W2K-WEB09.cg.ahp Service: Uptime Check Time: 60.322
Host: W2K-APPLI09-SD.cg.ahp Service: CPU Usage Check Time: 60.281
Host: WS-FS14-SD.cg.ahp Service: Uptime Check Time: 60.272
Host: W2K-APPLI09-SD.cg.ahp Service: Memory Usage Check Time: 60.260
Host: WS-FS14-SD.cg.ahp Service: Memory Usage Check Time: 60.209
Host: W2K-APPLI09-SD.cg.ahp Service: Services Check Time: 60.181
Host: W2K-WEB09.cg.ahp Service: Services Check Time: 60.164
Host: WS-FS11-SD.cg.ahp Service: Memory Usage Check Time: 60.164
Host: WS-FS14-SD.cg.ahp Service: Drives : Disk Usage Check Time: 60.163
Host: W2K-FS05-SD.cg.ahp Service: Uptime Check Time: 60.161
Host: W2K-APPLI09-SD.cg.ahp Service: Uptime Check Time: 60.160
Host: WS-FS11-SD.cg.ahp Service: Uptime Check Time: 60.160
Host: W2K-WEB09.cg.ahp Service: Memory Usage Check Time: 60.160
Host: W2K-FS05-SD.cg.ahp Service: CPU Usage Check Time: 60.159
Host: W2K-APPLI09-SD.cg.ahp Service: Drives : Disk Usage Check Time: 60.158
To be clear, I do not believe at this point that host checks are in any way related to your problems.
Frédéric GRANAT wrote:5 Windows servers and services.
I don't suppose any of the above listed hosts are the ones you added?
To simplify the above list for you:
W2K-APPLI09-SD.cg.ahp
W2K-FS05-SD.cg.ahp
W2K-WEB09.cg.ahp
WS-FS11-SD.cg.ahp
WS-FS14-SD.cg.ahp
Re: NagiosXI consuming a large amount of CPU
Posted: Wed Oct 07, 2015 2:45 am
by Frédéric GRANAT
Hi,
No more problem now, Idon't understand why.
You can close the post.
Frederic