Core Worker timed out and failed to reap child

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: Core Worker timed out and failed to reap child

Post by hbouma »

If we do have a situation where the server is just overloaded, what types of symptoms would we be seeing?
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Core Worker timed out and failed to reap child

Post by benjaminsmith »

Hi Henry,

You would see a high CPU load and high check latency since Nagios is unable to schedule the checks on time and this is what I'm seeing in the system profile.

Top Command Shows High CPU Load

Code: Select all

top - 07:54:43 up 5 days, 23:21,  1 user,  load average: 320.75, 313.22, 406.45
Tasks: 592 total, 177 running, 411 sleeping,   0 stopped,   4 zombie
Kernal Message Queues ( slow to write results to the databsae)

Code: Select all

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages    
0xd7000002 17         nagios     600        12948480     12645
I noticed that you have quite a few NCPA checks setup for an interval of every 3 minutes, if you can increase it 5 minutes that would help substantially.

The system has 8 Single Core CPU's, if you're able to add more that would help since scheduling active host and service check is CPU intensive.

Other options to reduce load would be to set up passive checks or integrating Mod Gearman.

Using NCPA For Passive Check
Integrating Mod-Gearman With Nagios XI
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
hbouma
Posts: 483
Joined: Tue Feb 27, 2018 9:31 am

Re: Core Worker timed out and failed to reap child

Post by hbouma »

I have reached out to our server team. They will be adding CPU's next week Friday and the following Friday.
benjaminsmith
Posts: 5324
Joined: Wed Aug 22, 2018 4:39 pm
Location: saint paul

Re: Core Worker timed out and failed to reap child

Post by benjaminsmith »

Hi Henry,
I have reached out to our server team. They will be adding CPU's next week Friday and the following Friday.
Great. Let us know the results.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked