nagios: wproc: 'Core Work XXXX' seems to be choke

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
tonyleatwork
Posts: 91
Joined: Mon Jul 07, 2014 8:55 am

nagios: wproc: 'Core Work XXXX' seems to be choke

Post by tonyleatwork »

Hi -

Looks like this error is coming back again (nagios: wproc: 'Core Work XXXX' seems to be choked).

Edit: http://support.nagios.com/forum/viewtop ... 12#p125712

The difference is that this time, my cpu utilization is low (around 25% average, 40% max in top - previously it was running full tilt). The system is a 6 core cpu. I added 'check_workers=25' in the nagios.cfg and that seemed to help.

Is this the right fix? I *think* it started after I updated to the latest version (R2.6). Was it because I added more checks?
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: nagios: wproc: 'Core Work XXXX' seems to be choke

Post by abrist »

tonyleatwork wrote:Was it because I added more checks?
Possibly. How many checks (per 5 minutes) are currently being executed?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
tonyleatwork
Posts: 91
Joined: Mon Jul 07, 2014 8:55 am

Re: nagios: wproc: 'Core Work XXXX' seems to be choke

Post by tonyleatwork »

Upping the workers to 25 increased CPU load (its spiking to 100% from time to time, user + system but still manageable) got rid of the WPROC issues.

Under Monitoring Engine Status:

Active Host Checks
1-min 71

5-min 357

15-min 484

Passive Host Checks
1-min 0

5-min 0

15-min 0

Active Service Checks
1-min 470

5-min 2,530

15-min 3,680

Passive Service Checks
1-min 0

5-min 0

15-min 0
cmerchant
Posts: 546
Joined: Wed Sep 24, 2014 11:19 am

Re: nagios: wproc: 'Core Work XXXX' seems to be choke

Post by cmerchant »

Yes, increasing the check_workers to 25 was the right idea. You should see a smoother line on the event queue graph. (your checks are spread out even in the interval)

Look at the Monitoring Engine Status page: Admin --> System Information --> Monitoring Engine Status

and pay attention to the host and service check latency to see if your checks are waiting too long to run.
Locked