I've just implemented nagios 4.1.1 and am having a similar issue .. extracts from nagios log below ...
Code: Select all
[1445329430] Nagios 4.1.1 starting... (PID=29152)
[1445329430] Local time is Tue Oct 20 09:23:50 BST 2015
[1445329430] LOG VERSION: 2.0
[1445329430] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1445329430] qh: core query handler registered
[1445329430] nerd: Channel hostchecks registered successfully
[1445329430] nerd: Channel servicechecks registered successfully
[1445329430] nerd: Channel opathchecks registered successfully
[1445329430] nerd: Fully initialized and ready to rock!
[1445329430] wproc: Successfully registered manager as @wproc with query handler
[1445329430] wproc: Registry request: name=Core Worker 29153;pid=29153
[1445329430] wproc: Registry request: name=Core Worker 29154;pid=29154
[1445329430] wproc: Registry request: name=Core Worker 29155;pid=29155
[1445329430] wproc: Registry request: name=Core Worker 29156;pid=29156
[1445329430] wproc: Registry request: name=Core Worker 29157;pid=29157
[1445329430] wproc: Registry request: name=Core Worker 29195;pid=29195
[1445329430] wproc: Registry request: name=Core Worker 29196;pid=29196
[1445329430] wproc: Registry request: name=Core Worker 29197;pid=29197
[1445329430] wproc: Registry request: name=Core Worker 29200;pid=29200
[1445329432] WARNING: Extinfo objects are deprecated and will be removed in future versions
[1445329432] WARNING: Extinfo objects are deprecated and will be removed in future versions
[1445329432] WARNING: Extinfo objects are deprecated and will be removed in future versions
==== repeated Extinfo messages removed
Code: Select all
[1445329434] Successfully launched command file worker with pid 29202
[1445329469] wproc: Core Worker 29153: job 1 (pid=29220) timed out. Killing it
[1445329469] wproc: Core Worker 29153: job 1 with pid 29220 reaped at timeout. timeouts=1; started=8
[1445329470] wproc: Core Worker 29155: job 1 (pid=29223) timed out. Killing it
[1445329470] wproc: Core Worker 29155: job 1 with pid 29223 reaped at timeout. timeouts=1; started=8
[1445329471] wproc: Core Worker 29195: job 1 (pid=29228) timed out. Killing it
[1445329471] wproc: Core Worker 29195: job 1 with pid 29228 reaped at timeout. timeouts=1; started=8
[1445329472] wproc: Core Worker 29197: job 1 (pid=29231) timed out. Killing it
[1445329472] wproc: Core Worker 29197: job 1 with pid 29231 reaped at timeout. timeouts=1; started=8
[1445329473] wproc: Core Worker 29200: job 1 (pid=29233) timed out. Killing it
[1445329473] wproc: Core Worker 29200: job 1 with pid 29233 reaped at timeout. timeouts=1; started=8
[1445329474] wproc: Core Worker 29153: job 2 (pid=29236) timed out. Killing it
[1445329474] wproc: Core Worker 29153: job 2 with pid 29236 reaped at timeout. timeouts=2; started=9
[1445329475] wproc: Core Worker 29154: job 2 (pid=29238) timed out. Killing it
==== many thousands of lines similar to the above ....
There are also a large number of <defunct> processes on the server.
Any suggestions ?
Ian
----------
In trying to diagnose what is generating the timeout messages I have been looking at the status.dat file - and I have noticed that the execution time for the checks is (generally) just over 60 seconds for services and just over 30 seconds for hosts. Both of these seem to link to the timeout values in the nagios configuration.
Code: Select all
$ egrep "check_execution_time=60|check_execution_time=30" status.dat | wc -l
1662
$ egrep "check_execution_time" status.dat | wc -l
1703
Interestingly the web interface is reporting the correct information - I'm just worried that there is something else going on which will break at some point in the future.
If I run a check_ping manually it usually responds within a couple of seconds so I'm confused about why there are all these timeout / killed messages .....
Ian
----------
Does this relate to the issue described at
https://tracker.nagios.org/view.php?id=498 Later in the thread it has the same messages as I am seeing and reports of a large number of defunct processes ....
Any help would be appreciated ...
Ian
----------
The apparent solution documented in the bug tracker (of configuring iobroker to epoll) is not available to me ...
./configure[6515]: test: 1: unknown operator
"epoll" is not available as an iobroker method.
Please use one of the other options.
I should have said that I'm running nagios on Solaris 11.2 (SPARC).
Ian