high cpu load on nagios server

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
smoren
Posts: 62
Joined: Tue Sep 29, 2015 7:24 am

high cpu load on nagios server

Post by smoren »

Hello all,

for no obvious reason, our nagios server has high cpu load for about 30-60 minutes. It happened several times within last 2 months.

My observation:
- during 'high cpu load' time these events are generated:
--<14>Apr 5 08:38:42 nagios nagios: wproc: Core Worker 7201: job 171289 (pid=28919): Dormant child reaped
--<14>Apr 5 08:39:10 nagios nagios: wproc: CHECK job 171361 from worker Core Worker 7201 timed out after 60.41s
- you may find several relevant performance graphs in attachments

Fyi - You can see a drop in number of service checks, it is because I have created service dependency for most of services to stop checks if nagios server has high cpu load.

So in general, nagios seems to have enough resources, but sometimes we encounter this high load. This has impact on availability of monitored services.
Do you have any idea what could be the cause?

Thanks.
You do not have the required permissions to view the files attached to this post.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: high cpu load on nagios server

Post by npolovenko »

Hello, @smoren. I would like to take a look at your system profile to get a general understanding of what your system looks like.
To send us the profile:
Login to the Nagios XI GUI using a web browser.
Click the "Admin" > "System Profile" Menu
Click the "Download Profile" button
Save the profile.zip file, upload it to a cloud storage of your choice, and share a link with me in a personal message.
After you send it please post something in this thread to bring it back up in the support queue.

I'm also attaching the profiler script that you can run on the Nagios server. It will tell you the execution time each service check on your system so you can see if anything looks out of the ordinary.

System profile was received and shared with the support team.
You do not have the required permissions to view the files attached to this post.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
smoren
Posts: 62
Joined: Tue Sep 29, 2015 7:24 am

Re: high cpu load on nagios server

Post by smoren »

Hello,

I sent you a private message.

Thanks for the profiler script, but I couldn't find anything suspicious. I have already created a script to show me an overview of execution times of services, based on check commands (command name, number of services, min, max and average of execution time). But even using this script all values were as expected. I suspect that longer execution time (and bigger load) is just an effect, not the cause.

Thanks.
npolovenko
Support Tech
Posts: 3457
Joined: Mon May 15, 2017 5:00 pm

Re: high cpu load on nagios server

Post by npolovenko »

@smoren, I took a look at your profile and nothing seems out of the ordinary. Could it be that a lot of service checks came back critical at that particular time? Do you have any maintenance jobs in a cron on the server? Are you using any third party antivirus applications? Sometimes they can leave multiple ghost httpd processes running causing an increase in CPU load.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
smoren
Posts: 62
Joined: Tue Sep 29, 2015 7:24 am

Re: high cpu load on nagios server

Post by smoren »

Hello,
infrastructure had no problem at that time. And it happened few hours later again. And again this morning. Maybe we're 'lucky' that it began to appear more often than earlier...
I have no maintenance jobs, only one reporting script that runs every day for more than one year. There was no high load during this time. Except that, I have few scheduled reports(availability, sla,..). But I don't think this could cause the issue. Especially when these reports run every day, and high cpu load was only few times.
There is no antivirus application on the server.

Did you notice, that there was an increase in MySQL Open Connection few minutes before other parameters increased? Could you explain this behavior? Could you also explain that 2 types of events I sent in 1st post? These events
appear during high load in high number.
Or do you have any idea what checks could I create, that would help us to troubleshoot this should issue appear again?
Thanks.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: high cpu load on nagios server

Post by lmiltchev »

Most probably the mysql open connections have nothing to do with the issue. What is the hardware like on this server - CPU, RAM, etc.? You have a somewhat large environment. When you have numerous failing checks (@7:00 on your graph) that need to be rechecked every minute, this could cause load spikes.

Can you PM us a new profile? We may be able to find some clues in the new logs. Is opening a ticket in our ticketing system an option for you?
Be sure to check out our Knowledgebase for helpful articles and solutions!
smoren
Posts: 62
Joined: Tue Sep 29, 2015 7:24 am

Re: high cpu load on nagios server

Post by smoren »

Hello,

you've got new PM :-). It is virtual server - 4 [email protected], 8 GB RAM, RHEL 6.8. I have info from our operation team that there was no real outage in infrastracture during that time. Btw. it happened again, yesterday at about 12:15 AM. Check CPU Load graph for last month(in attachment).

I'm not sure if it is option for me :) I never used it, but if you navigate me how to use it, we can use it :-).

Thanks.
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: high cpu load on nagios server

Post by scottwilkerson »

One thing we see in many virtual environments is that another VM on the server can cause severe load, specifically to disk on the shared environment. Are you monitoring any other VM's that you could look to see if they also had a load spike at the same time?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
smoren
Posts: 62
Joined: Tue Sep 29, 2015 7:24 am

Re: high cpu load on nagios server

Post by smoren »

Hello,
unfortunately, we were not able to find any correlation between Nagios high CPU load and load on another server. In general, our other load intensive servers are on another VMware cluster. I also checked state history of all services (just before high load started), but again, no luck. I did it for several times when issue occurred.
I do monitor disk usage on Nagios server, but there were no spikes during high load...
Do you have an idea, what command should I run, If I'm lucky enough and I'll be able to log into server during such high load? Something that might help us troubleshoot it further... Or additional check for localhost...
Thanks.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: high cpu load on nagios server

Post by scottwilkerson »

A simple top command the 0.0%wa on the CPU(s) line, if this number is high, the disk is causing performance problems.

Another command that is good to run is

Code: Select all

ps -ef
This will show all processes running and what CPU % they are using
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked