Nagios 4.2.0 logging "read() returned error 11" message

DrydenK · Post by **DrydenK** » Thu Aug 18, 2016 9:35 am

Hi,

I updated yesterday my Nagios 4.1.1 install to 4.2.0, following the instructions in https://assets.nagios.com/downloads/nag ... ading.html.

Everything is working fine, but Nagios is generating messages in my log system saying:
<date> <server> nagios: job <job number> (pid=<pid>): read() returned error 11

this happens about once a minute. Does somebody know what's going on?

Thank you,

Roberto

PS: install environment:
VM running under XenServer 6.5, fully patched
Linux Debian 7.0, fully updated
Nagios installed from tar-ball
Server with 2GB RAM
100GB disk

rkennedy · Post by **rkennedy** » Thu Aug 18, 2016 4:05 pm

Please turn on debugging in nagios.cfg. Set debug_level to -1 and debug_verbosity to 2. Then, restart nagios.

Now, when you see the message appear please run ps uw -C nagios, and see which PID is taking up the most resources. Run gdp -p pid (install gdp if needed, and replace pid with the #). Now, run bt and post the results of the backtrace here. This should help out developers see what exactly is going on.

An example is below -

Code: Select all

# ps uw -C nagios
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
nagios    8828  0.0  0.0  18636  2860 ?        Ss   10:55   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    8829  0.0  0.0  12932  2624 ?        S    10:55   0:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagi
nagios    8831 25.3  0.0  12932  2624 ?        S    10:55   0:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagi
nagios    8835  0.0  0.0  18120  1624 ?        S    10:55   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
# gdb -p 8831
     . . . a couple of pages of gdb stuff . . .
(gdb) bt
#0  0x00007f2f5ae63643 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x0000000000472b80 in iobroker_poll (iobs=0x108c040, timeout=timeout@entry=30004)
    at iobroker.c:337
#2  0x000000000047595e in enter_worker (sd=sd@entry=3, cb=<optimized out>)
    at worker.c:830
#3  0x0000000000416166 in nagios_core_worker (
    path=0x7ffc391d2ed0 "/usr/local/nagios/var/rw/nagios.qh") at nagios.c:182
#4  main (argc=<optimized out>, argv=0x7ffc391d2c48) at nagios.c:318
(gdb) detach
Detaching from program: /usr/local/nagios/bin/nagios, process 8829
(gdb) q
#

It might not be the same problem, but it's most likely related to this thread somehow. https://support.nagios.com/forum/viewto ... 67#p169505

DrydenK · Post by **DrydenK** » Fri Aug 19, 2016 7:35 am

Ok, I followed the procedure, those where the results.

The full output in syslog was the following:
Aug 19 09:14:42 <server> nagios: wproc: Core Worker 11582: job 87 (pid=13292) timed out. Killing it
Aug 19 09:14:42 <server> nagios: job 87 (pid=13292): read() returned error 11
Aug 19 09:14:42 <server> nagios: wproc: CHECK job 87 from worker Core Worker 11582 timed out after 30.00s
Aug 19 09:14:42 <server> nagios: wproc: host=voip-univesp; service=(null);
Aug 19 09:14:42 <server> nagios: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Aug 19 09:14:42 <server> nagios: Warning: Check of host 'voip-univesp' timed out after 30.00 seconds
Aug 19 09:14:42 <server> nagios: wproc: Core Worker 11582: job 87 (pid=13292): Dormant child reaped

Then gdb gave me the following:
root@<server>:/var/log# gdb -p 11576
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 11576
Reading symbols from /usr/local/nagios/bin/nagios...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libm.so.6
Reading symbols from /usr/lib/x86_64-linux-gnu/libltdl.so.7...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/libltdl.so.7
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libdl.so.2
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/x86_64-linux-gnu/libnss_compat.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libnss_compat.so.2
Reading symbols from /lib/x86_64-linux-gnu/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libnsl.so.1
Reading symbols from /lib/x86_64-linux-gnu/libnss_nis.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libnss_nis.so.2
Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2
0x00007f100b91a5f3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007f100b91a5f3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000000000472201 in iobroker_poll ()
#2 0x0000000000433413 in event_execution_loop ()
#3 0x0000000000415c35 in main ()

Roberto

Post by **tgriep** » Fri Aug 19, 2016 2:21 pm

When enabling debugging, the information will be saved in this file,

Code: Select all

/usr/local/nagios/var/nagios.debug

Can you search that file and search for those errors and post them?

DrydenK · Post by **DrydenK** » Fri Aug 19, 2016 2:43 pm

I've looked for the file, but it quickly becomes very large. I searched for "error" in it, but the string is not in there. Also, searching for a check of the host that apparently caused the message, didn't help much. What should I search to be able to paste the relevant portion of the debug? The file became too large to post here (>10,000 lines...).

Roberto

Post by **tgriep** » Mon Aug 22, 2016 10:46 am

Try and search the debug file for timeout, wproc or Core Worker messages.
Either that, you can trim the file down and upload a few minutes worth as the error is happening every minute, there should be something in it to see.

avisscher · Post by **avisscher** » Tue Sep 06, 2016 6:55 am

Hi there,

If have found the same messages in message.log but i can’t find any strange things in the Nagios.debug
nagios[5711]: job 0 (pid=5735): read() returned error 11

This messages only appears when a notification has to go out.
It looks like it has something to do with the worker.c that is different then the previous version
Stil the notifications are send out.

Best regards, Albert Visscher

Post by **tgriep** » Tue Sep 06, 2016 12:45 pm

Make sure the Notification command is correct and didn't get corrupted during the upgrade.
It could also be a permission problem. When notifications are send, they are set using the nagios user account.
You may want to check that as well.

avisscher · Post by **avisscher** » Wed Sep 07, 2016 6:49 am

hi,

I did try to reproduce the problem and did just a install from a fresh downloaded tarbal and no configuration changes.
So it's not likely that the problem comes from wrong notification syntax or permissions.

best regards, Albert Visscher

Post by **tgriep** » Wed Sep 07, 2016 11:19 am

You may want to post on the Nagios Core GIT Hub page with the server OS, configurations and error messaged so the developer will be able to troubleshoot what is causing the problem.
https://github.com/NagiosEnterprises/nagioscore/issues

Nagios Support Forum

Nagios 4.2.0 logging "read() returned error 11" message

Nagios 4.2.0 logging "read() returned error 11" message

Re: Nagios 4.2.0 logging "read() returned error 11" message

Re: Nagios 4.2.0 logging "read() returned error 11" message

Re: Nagios 4.2.0 logging "read() returned error 11" message

Re: Nagios 4.2.0 logging "read() returned error 11" message

Re: Nagios 4.2.0 logging "read() returned error 11" message

Re: Nagios 4.2.0 logging "read() returned error 11" message

Re: Nagios 4.2.0 logging "read() returned error 11" message

Re: Nagios 4.2.0 logging "read() returned error 11" message

Re: Nagios 4.2.0 logging "read() returned error 11" message