Hi -
After a reboot I started getting a lot of:
Jan 21 15:27:19 nwd2ng01 nagios: wproc: 'Core Worker 1835' seems to be choked. ret = -1; bufsize = 180: errno = 11 (Resource temporarily unavailable)
Jan 21 15:27:19 nwd2ng01 nagios: Unable to run check for service 'Page File Usage' on host 'hostname.corp.com'
The issue goes for a while and then goes away and comes back. Not sure what the trigger point is. The /var/log/messages grew to 32MB just today.
Profile is below:
Close
Nagios XI Installation Profile
Download Profile
System:
Nagios XI Version : 2014R1.4
nwd2ng01.corp.analog.com 2.6.32-358.2.1.el6.x86_64 x86_64
CentOS release 6.5 (Final)
Gnome is not installed
Apache Information
PHP Version: 5.3.3
Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0
Server Name: nwd2ng01.corp.analog.com
Server Address: 10.64.52.120
Server Port: 80
Date/Time
PHP Timezone: America/New_York
PHP Time: Wed, 21 Jan 2015 15:40:32 -0500
System Time: Wed, 21 Jan 2015 15:40:32 -0500
Nagios XI Data
License ends in: MSTNQS
nagios (pid 1827) is running...
NPCD running (pid 1776).
ndo2db (pid 1851) is running...
CPU Load 15: 17.55
Total Hosts: 444
Total Services: 126
Function 'get_base_uri' returns: http://nwd2ng01.corp.analog.com/nagiosxi/
Function 'get_base_url' returns: http://nwd2ng01.corp.analog.com/nagiosxi/
Function 'get_backend_url(internal_call=false)' returns: http://nwd2ng01.corp.analog.com/nagiosx ... rofile.php
Function 'get_backend_url(internal_call=true)' returns: http://localhost/nagiosxi/backend/
Ping Test localhost
Running:
/bin/ping -c 3 localhost 2>&1
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.015 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.014 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.017 ms
--- localhost ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.014/0.015/0.017/0.003 ms
Test wget To localhost
WGET From URL: http://localhost/nagiosxi/includes/components/ccm/
Running:
/usr/bin/wget http://localhost/nagiosxi/includes/components/ccm/
--2015-01-21 15:40:34-- http://localhost/nagiosxi/includes/components/ccm/
Resolving localhost... ::1, 127.0.0.1
Connecting to localhost|::1|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: "/usr/local/nagiosxi/tmp/ccm_index.tmp"
0K ......... 16.7M=0.001s
2015-01-21 15:40:34 (16.7 MB/s) - "/usr/local/nagiosxi/tmp/ccm_index.tmp" saved [9666]
Continouus nagios: wproc: 'Core Work XXXX' seems to be choke
-
tonyleatwork
- Posts: 91
- Joined: Mon Jul 07, 2014 8:55 am
-
tonyleatwork
- Posts: 91
- Joined: Mon Jul 07, 2014 8:55 am
Re: Continouus nagios: wproc: 'Core Work XXXX' seems to be c
I realized that the messages relate to actual processes. I didn't check for zombie processes until now, but there are 17 right after boot.
The zombie processes are mostly check_wmi_plus, probably waiting for a response?
Questions: will this impact my monitoring? Will Nagios try again? Should this be a concern and how do we address it? Thanks in advance.
The zombie processes are mostly check_wmi_plus, probably waiting for a response?
Questions: will this impact my monitoring? Will Nagios try again? Should this be a concern and how do we address it? Thanks in advance.
Re: Continouus nagios: wproc: 'Core Work XXXX' seems to be c
It indeed should be rescheduled. How often do you see these errors/warnings?tonyleatwork wrote:Will Nagios try again?
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
-
tonyleatwork
- Posts: 91
- Joined: Mon Jul 07, 2014 8:55 am
Re: Continouus nagios: wproc: 'Core Work XXXX' seems to be c
It's happening every 10 minutes or so, or about every 2 intervals (most of my checks are at 5 min intervals). There are about 10-15 messages referencing about 4-5 unique PIDs.
My hunch was that it is performance related. Fortunately this is on a VM and I was able to just add more processors to it. That fixed the problem (or just further masked it?).
The concern is that we "only" have 2700 checks, should the system be this pegged already? We gave it 4 processors @ 3+ghz each, 8gb ram.
How can I test for my sizing requirements and see if this isn't just a 'gremlin' in the system?
My hunch was that it is performance related. Fortunately this is on a VM and I was able to just add more processors to it. That fixed the problem (or just further masked it?).
The concern is that we "only" have 2700 checks, should the system be this pegged already? We gave it 4 processors @ 3+ghz each, 8gb ram.
How can I test for my sizing requirements and see if this isn't just a 'gremlin' in the system?
Re: Continouus nagios: wproc: 'Core Work XXXX' seems to be c
I would suggest you look at the Monitoring Engine Status page: Admin --> System Information --> Monitoring Engine Status
Look at the event queue (how many concurrent checks),
Check Statistics (quantity and rate of checks), and
Performance (avg time to complete checks).
Look at the event queue (how many concurrent checks),
Check Statistics (quantity and rate of checks), and
Performance (avg time to complete checks).
-
tonyleatwork
- Posts: 91
- Joined: Mon Jul 07, 2014 8:55 am
Re: Continouus nagios: wproc: 'Core Work XXXX' seems to be c
Please close this. A workaround was to increase the processing capacity of the system. I have a separate request in regards to system performance.