Page 1 of 4

Getting "Error: Could not read host and service status".

Posted: Tue Oct 27, 2015 10:56 pm
by gavinh
Hi I am getting the "Error: Could not read host and service status information!" often on refreshes. If I hit the refresh button a few times, it goes away. But, this is happening with enough frequency that people are beginning to talk of finding a more stable monitoring application.

Any ideas on how to resolve this?

Gavin

Re: Getting "Error: Could not read host and service status".

Posted: Wed Oct 28, 2015 1:30 am
by Box293
Can you please provide a screenshot of this problem.

How many hosts / services are you monitoring?

Re: Getting "Error: Could not read host and service status".

Posted: Wed Oct 28, 2015 11:11 am
by gavinh
I can give a screenshot when it happens, but its the standard error message thats provided by Nagios.

I am currently a farm of roughly 750 servers, more added in phases. We have an additional 50 coming shortly.

Here is an error from my logfile;

Mon Oct 19 08:48:02.605278 2015] [cgi:warn] [pid 52159] [client 10.2.166.227:58106] AH01220: Timeout waiting for output from CGI script /usr/lib/cgi-bin/nagios3/cmd.cgi, referer: http://mobile-mon-02/cgi-bin/nagios3/cm ... orce_check
[Mon Oct 19 08:48:02.605365 2015] [cgi:error] [pid 52159] [client 10.2.166.227:58106] Script timed out before returning headers: cmd.cgi, referer: http://mobile-mon-02/cgi-bin/nagios3/cm ... orce_check
[Mon Oct 19 08:53:02.705436 2015] [cgi:warn] [pid 52159] [client 10.2.166.227:58106] AH01220: Timeout waiting for output from CGI script /usr/lib/cgi-bin/nagios3/cmd.cgi, referer: http://mobile-mon-02/cgi-bin/nagios3/cm ... orce_check

Re: Getting "Error: Could not read host and service status".

Posted: Wed Oct 28, 2015 2:14 pm
by gavinh
Here is a screenshot.

Image

Re: Getting "Error: Could not read host and service status".

Posted: Wed Oct 28, 2015 5:34 pm
by Box293
I have some ideas as to what is happening, but first let me get some more information.

What is the output of:

Code: Select all

top -n 1
df -h

Re: Getting "Error: Could not read host and service status".

Posted: Thu Oct 29, 2015 12:11 pm
by gavinh

Code: Select all

Tasks: 487 total,   1 running, 486 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.2 sy,  0.2 ni, 99.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  98967472 total, 22276932 used, 76690544 free,   320460 buffers
KiB Swap: 10062643+total,        0 used, 10062643+free.  5436064 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
56539 gavinh    20   0   25240   1920   1084 R  11.8  0.0   0:00.03 top
    1 root      20   0   33636   2912   1472 S   0.0  0.0   2:04.45 init
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.04 kthreadd
    3 root      20   0       0      0      0 S   0.0  0.0   3:33.60 ksoftirqd/0
    4 root      20   0       0      0      0 S   0.0  0.0  16:28.13 kworker/0:0
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H
    6 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kworker/u128:0
    7 root      20   0       0      0      0 S   0.0  0.0  11:21.77 kworker/u129:0
    8 root      20   0       0      0      0 S   0.0  0.0 116:23.18 rcu_sched
    9 root      20   0       0      0      0 S   0.0  0.0  12:57.67 rcuos/0
   10 root      20   0       0      0      0 S   0.0  0.0  16:42.86 rcuos/1
   11 root      20   0       0      0      0 S   0.0  0.0  16:20.89 rcuos/2
   12 root      20   0       0      0      0 S   0.0  0.0  13:16.49 rcuos/3
   13 root      20   0       0      0      0 S   0.0  0.0  12:42.85 rcuos/4
   14 root      20   0       0      0      0 S   0.0  0.0  17:42.23 rcuos/5
   15 root      20   0       0      0      0 S   0.0  0.0  18:51.60 rcuos/6
   16 root      20   0       0      0      0 S   0.0  0.0  25:04.29 rcuos/7
   17 root      20   0       0      0      0 S   0.0  0.0  16:33.80 rcuos/8
   18 root      20   0       0      0      0 S   0.0  0.0  13:29.25 rcuos/9
   19 root      20   0       0      0      0 S   0.0  0.0  10:17.25 rcuos/10
   20 root      20   0       0      0      0 S   0.0  0.0   8:19.09 rcuos/11
   21 root      20   0       0      0      0 S   0.0  0.0  14:35.73 rcuos/12
   22 root      20   0       0      0      0 S   0.0  0.0  12:20.56 rcuos/13
   23 root      20   0       0      0      0 S   0.0  0.0  12:12.51 rcuos/14
   24 root      20   0       0      0      0 S   0.0  0.0  12:00.18 rcuos/15
   25 root      20   0       0      0      0 S   0.0  0.0   8:21.59 rcuos/16
   26 root      20   0       0      0      0 S   0.0  0.0   7:52.82 rcuos/17
   27 root      20   0       0      0      0 S   0.0  0.0   7:38.31 rcuos/18
   28 root      20   0       0      0      0 S   0.0  0.0   7:08.39 rcuos/19
   29 root      20   0       0      0      0 S   0.0  0.0   5:31.72 rcuos/20
   30 root      20   0       0      0      0 S   0.0  0.0  15:38.09 rcuos/21
   31 root      20   0       0      0      0 S   0.0  0.0  10:39.78 rcuos/22
   32 root      20   0       0      0      0 S   0.0  0.0  10:13.09 rcuos/23
   33 root      20   0       0      0      0 S   0.0  0.0   9:01.40 rcuos/24
   34 root      20   0       0      0      0 S   0.0  0.0   8:04.82 rcuos/25
   35 root      20   0       0      0      0 S   0.0  0.0   7:20.06 rcuos/26
   36 root      20   0       0      0      0 S   0.0  0.0   6:23.97 rcuos/27
   37 root      20   0       0      0      0 S   0.0  0.0   6:16.82 rcuos/28
   38 root      20   0       0      0      0 S   0.0  0.0   8:37.39 rcuos/29
   39 root      20   0       0      0      0 S   0.0  0.0   5:28.77 rcuos/30
   40 root      20   0       0      0      0 S   0.0  0.0  15:06.59 rcuos/31
   41 root      20   0       0      0      0 S   0.0  0.0   9:21.85 rcuos/32
   42 root      20   0       0      0      0 S   0.0  0.0   8:41.83 rcuos/33
   43 root      20   0       0      0      0 S   0.0  0.0   7:49.56 rcuos/34
   44 root      20   0       0      0      0 S   0.0  0.0   7:00.99 rcuos/35
   45 root      20   0       0      0      0 S   0.0  0.0   6:13.11 rcuos/36
   46 root      20   0       0      0      0 S   0.0  0.0   5:27.05 rcuos/37
   47 root      20   0       0      0      0 S   0.0  0.0   4:51.97 rcuos/38
   48 root      20   0       0      0      0 S   0.0  0.0   7:07.31 rcuos/39
   49 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/40
   50 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/41
   51 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/42
   52 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/43
   53 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/44
   54 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/45
   55 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/46
   56 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/47
   57 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/48
   58 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/49
   59 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/50
   60 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/51
   61 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/52
   62 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/53
   63 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/54
   64 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/55
   65 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/56
   66 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuos/57

Code: Select all

/dev/mapper/mobile--mon--02--vg-root  5.4T   15G  5.1T   1% /
none                                  4.0K     0  4.0K   0% /sys/fs/cgroup
udev                                   48G  4.0K   48G   1% /dev
tmpfs                                 9.5G 1004K  9.5G   1% /run
none                                  5.0M     0  5.0M   0% /run/lock
none                                   48G     0   48G   0% /run/shm
none                                  100M     0  100M   0% /run/user
/dev/sda2                             237M   39M  187M  18% /boot

Re: Getting "Error: Could not read host and service status".

Posted: Thu Oct 29, 2015 4:57 pm
by tmcdonald
What Core version are you running? It looks like the interface has been modified.

Re: Getting "Error: Could not read host and service status".

Posted: Fri Oct 30, 2015 10:57 am
by gavinh
I am running 3.5.1 - yes I changed the colors some, pretty straight change of color hex code.

Re: Getting "Error: Could not read host and service status".

Posted: Fri Oct 30, 2015 1:30 pm
by jdalrymple
gavinh wrote:Script timed out before returning headers: cmd.cgi, referer: http://mobile-mon-02/cgi-bin/nagios3/cm ... orce_check
This is the kind of error we usually see on long running CGIs, not cmd.cgi. What page(s) specifically are you trying to load when you get the failure? Is the failure happening after 30 seconds (which I believe is the default httpd script execution timeout) or right away?

It may just be a matter of increasing the script execution timeout in httpd. Alternatively you might be better serviced getting status.dat onto an SSD or tmpfs.

Re: Getting "Error: Could not read host and service status".

Posted: Wed Nov 04, 2015 3:18 pm
by gavinh
I'll check into the httpd.