Nagios 4.2.0 logging "read() returned error 11" message

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
DrydenK
Posts: 3
Joined: Thu Aug 18, 2016 9:27 am

Nagios 4.2.0 logging "read() returned error 11" message

Post by DrydenK »

Hi,

I updated yesterday my Nagios 4.1.1 install to 4.2.0, following the instructions in https://assets.nagios.com/downloads/nag ... ading.html.

Everything is working fine, but Nagios is generating messages in my log system saying:
<date> <server> nagios: job <job number> (pid=<pid>): read() returned error 11

this happens about once a minute. Does somebody know what's going on?

Thank you,

Roberto

PS: install environment:
VM running under XenServer 6.5, fully patched
Linux Debian 7.0, fully updated
Nagios installed from tar-ball
Server with 2GB RAM
100GB disk
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Nagios 4.2.0 logging "read() returned error 11" message

Post by rkennedy »

Please turn on debugging in nagios.cfg. Set debug_level to -1 and debug_verbosity to 2. Then, restart nagios.

Now, when you see the message appear please run ps uw -C nagios, and see which PID is taking up the most resources. Run gdp -p pid (install gdp if needed, and replace pid with the #). Now, run bt and post the results of the backtrace here. This should help out developers see what exactly is going on.

An example is below -

Code: Select all

# ps uw -C nagios
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
nagios    8828  0.0  0.0  18636  2860 ?        Ss   10:55   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios    8829  0.0  0.0  12932  2624 ?        S    10:55   0:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagi
nagios    8831 25.3  0.0  12932  2624 ?        S    10:55   0:00 /usr/local/nagios/bin/nagios --worker /usr/local/nagios/var/rw/nagi
nagios    8835  0.0  0.0  18120  1624 ?        S    10:55   0:00 /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
# gdb -p 8831
     . . . a couple of pages of gdb stuff . . .
(gdb) bt
#0  0x00007f2f5ae63643 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x0000000000472b80 in iobroker_poll (iobs=0x108c040, timeout=timeout@entry=30004)
    at iobroker.c:337
#2  0x000000000047595e in enter_worker (sd=sd@entry=3, cb=<optimized out>)
    at worker.c:830
#3  0x0000000000416166 in nagios_core_worker (
    path=0x7ffc391d2ed0 "/usr/local/nagios/var/rw/nagios.qh") at nagios.c:182
#4  main (argc=<optimized out>, argv=0x7ffc391d2c48) at nagios.c:318
(gdb) detach
Detaching from program: /usr/local/nagios/bin/nagios, process 8829
(gdb) q
#
It might not be the same problem, but it's most likely related to this thread somehow. https://support.nagios.com/forum/viewto ... 67#p169505
Former Nagios Employee
DrydenK
Posts: 3
Joined: Thu Aug 18, 2016 9:27 am

Re: Nagios 4.2.0 logging "read() returned error 11" message

Post by DrydenK »

Ok, I followed the procedure, those where the results.

The full output in syslog was the following:
Aug 19 09:14:42 <server> nagios: wproc: Core Worker 11582: job 87 (pid=13292) timed out. Killing it
Aug 19 09:14:42 <server> nagios: job 87 (pid=13292): read() returned error 11
Aug 19 09:14:42 <server> nagios: wproc: CHECK job 87 from worker Core Worker 11582 timed out after 30.00s
Aug 19 09:14:42 <server> nagios: wproc: host=voip-univesp; service=(null);
Aug 19 09:14:42 <server> nagios: wproc: early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Aug 19 09:14:42 <server> nagios: Warning: Check of host 'voip-univesp' timed out after 30.00 seconds
Aug 19 09:14:42 <server> nagios: wproc: Core Worker 11582: job 87 (pid=13292): Dormant child reaped

Then gdb gave me the following:
root@<server>:/var/log# gdb -p 11576
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Attaching to process 11576
Reading symbols from /usr/local/nagios/bin/nagios...(no debugging symbols found)...done.
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libm.so.6
Reading symbols from /usr/lib/x86_64-linux-gnu/libltdl.so.7...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/libltdl.so.7
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libdl.so.2
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/x86_64-linux-gnu/libnss_compat.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libnss_compat.so.2
Reading symbols from /lib/x86_64-linux-gnu/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libnsl.so.1
Reading symbols from /lib/x86_64-linux-gnu/libnss_nis.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libnss_nis.so.2
Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2
0x00007f100b91a5f3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007f100b91a5f3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000000000472201 in iobroker_poll ()
#2 0x0000000000433413 in event_execution_loop ()
#3 0x0000000000415c35 in main ()

Roberto
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios 4.2.0 logging "read() returned error 11" message

Post by tgriep »

When enabling debugging, the information will be saved in this file,

Code: Select all

/usr/local/nagios/var/nagios.debug
Can you search that file and search for those errors and post them?
Be sure to check out our Knowledgebase for helpful articles and solutions!
DrydenK
Posts: 3
Joined: Thu Aug 18, 2016 9:27 am

Re: Nagios 4.2.0 logging "read() returned error 11" message

Post by DrydenK »

I've looked for the file, but it quickly becomes very large. I searched for "error" in it, but the string is not in there. Also, searching for a check of the host that apparently caused the message, didn't help much. What should I search to be able to paste the relevant portion of the debug? The file became too large to post here (>10,000 lines...).

Roberto
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios 4.2.0 logging "read() returned error 11" message

Post by tgriep »

Try and search the debug file for timeout, wproc or Core Worker messages.
Either that, you can trim the file down and upload a few minutes worth as the error is happening every minute, there should be something in it to see.
Be sure to check out our Knowledgebase for helpful articles and solutions!
avisscher
Posts: 2
Joined: Fri Jun 26, 2015 8:01 am

Re: Nagios 4.2.0 logging "read() returned error 11" message

Post by avisscher »

Hi there,

If have found the same messages in message.log but i can’t find any strange things in the Nagios.debug
nagios[5711]: job 0 (pid=5735): read() returned error 11

This messages only appears when a notification has to go out.
It looks like it has something to do with the worker.c that is different then the previous version
Stil the notifications are send out.

Best regards, Albert Visscher
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios 4.2.0 logging "read() returned error 11" message

Post by tgriep »

Make sure the Notification command is correct and didn't get corrupted during the upgrade.
It could also be a permission problem. When notifications are send, they are set using the nagios user account.
You may want to check that as well.
Be sure to check out our Knowledgebase for helpful articles and solutions!
avisscher
Posts: 2
Joined: Fri Jun 26, 2015 8:01 am

Re: Nagios 4.2.0 logging "read() returned error 11" message

Post by avisscher »

hi,

I did try to reproduce the problem and did just a install from a fresh downloaded tarbal and no configuration changes.
So it's not likely that the problem comes from wrong notification syntax or permissions.

best regards, Albert Visscher
User avatar
tgriep
Madmin
Posts: 9177
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios 4.2.0 logging "read() returned error 11" message

Post by tgriep »

You may want to post on the Nagios Core GIT Hub page with the server OS, configurations and error messaged so the developer will be able to troubleshoot what is causing the problem.
https://github.com/NagiosEnterprises/nagioscore/issues
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked