Page 1 of 2

NagVis - NDO claims that nagios did not status update

Posted: Thu Aug 24, 2017 1:16 pm
by ssoliveira
Nagvis NDO claims that nagios did not status update for more than 180 seconds

Often; NagVis is having problems; Requiring service to be restarted.
How can I investigate the reason?

Code: Select all

[root@st-dc3a-nagios-n01 ~]# ps -ef | grep ndo2db
nagios   27757     1  0 15:08 ?        00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios   28902 27757  0 15:08 ?        00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios   28907 28902 12 15:08 ?        00:00:02 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
root     30027 13067  0 15:08 pts/0    00:00:00 grep ndo2db
Are there any logs I can analyze?

service ndo2db stop
service ndo2db start

Thank you

Re: NagVis - NDO claims that nagios did not status update

Posted: Thu Aug 24, 2017 1:27 pm
by scottwilkerson
Searching our forums for this error did reveal this problem has been seen in the past, can you try the commands in this post

https://support.nagios.com/forum/viewto ... 317#p18314

Re: NagVis - NDO claims that nagios did not status update

Posted: Thu Aug 24, 2017 1:48 pm
by ssoliveira
I already read this topic, and suggest restarting the services, and comments on a ntp time synchronization. That everything is ok.

My problem is that the error is occurring frequently, and I need to restart services.
I would like a way to analyze the problem, to try to identify a cause, in /var/log/messages there is nothing useful.

Re: NagVis - NDO claims that nagios did not status update

Posted: Thu Aug 24, 2017 2:31 pm
by scottwilkerson
What version of Nagios XI are you running?

Re: NagVis - NDO claims that nagios did not status update

Posted: Thu Aug 24, 2017 2:35 pm
by scottwilkerson
Here's a solution for the current XI version

Run the following from the CLI

Code: Select all

sed -i "s/maxtimewithoutupdate=180/maxtimewithoutupdate=86400/g" /usr/local/nagvis/etc/nagvis.ini.php

Re: NagVis - NDO claims that nagios did not status update

Posted: Wed Aug 30, 2017 5:48 pm
by ssoliveira
What is the behavior after changing this parameter?

; maximum delay of the NDO Database in seconds
;maxtimewithoutupdate=180

I have verified high CPU utilization by Apache processes.

Code: Select all

Tasks: 529 total,   7 running, 521 sleeping,   0 stopped,   1 zombie
Cpu(s): 78.0%us, 15.8%sy,  0.0%ni,  5.2%id,  0.3%wa,  0.1%hi,  0.7%si,  0.0%st
Mem:  49283936k total, 38816824k used, 10467112k free,   262040k buffers
Swap:  4194300k total,     4332k used,  4189968k free, 17681732k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
13646 apache    20   0  459m  39m 8712 R 38.8  0.1   0:22.52 httpd
12861 apache    20   0  459m  39m 8464 R 33.2  0.1   0:43.63 httpd
29545 apache    20   0  458m  38m 8704 S 30.3  0.1   0:05.25 httpd
24815 apache    20   0  459m  38m 8636 R 27.7  0.1   0:13.86 httpd
15329 apache    20   0  457m  37m 8544 S 23.1  0.1   0:06.12 httpd
26758 apache    20   0  440m  27m 5116 S 22.5  0.1   0:03.84 httpd
29183 apache    20   0  442m  29m 4300 S 17.9  0.1   0:03.89 httpd
 3580 apache    20   0  446m  34m 5800 S 16.9  0.1   0:14.58 httpd
14789 nagios    20   0  126m 5208 1956 R 15.6  0.0   0:00.57 process_perfdat
 6901 apache    20   0  456m  36m 8712 S 15.0  0.1   0:21.75 httpd
14783 nagios    20   0  125m 4784 1956 R 14.3  0.0   0:00.55 process_perfdat
30373 apache    20   0  451m  31m 8652 S 13.7  0.1   0:18.44 httpd
15326 apache    20   0  440m  27m 5320 S 12.4  0.1   0:00.38 httpd
28442 apache    20   0  458m  37m 8416 S 10.8  0.1   0:14.80 httpd
 1156 apache    20   0  456m  36m 8680 S 10.1  0.1   0:22.91 httpd
13931 apache    20   0  442m  29m 4284 S 10.1  0.1   0:00.78 httpd
I need help to perform a troubleshooting; More in depth; And find out what the problem is.
Is it possible to separate NagVis processing on a separate server?

Re: NagVis - NDO claims that nagios did not status update

Posted: Thu Aug 31, 2017 8:48 am
by scottwilkerson
ssoliveira wrote:What is the behavior after changing this parameter?

; maximum delay of the NDO Database in seconds
;maxtimewithoutupdate=180

I have verified high CPU utilization by Apache processes.

Code: Select all

Tasks: 529 total,   7 running, 521 sleeping,   0 stopped,   1 zombie
Cpu(s): 78.0%us, 15.8%sy,  0.0%ni,  5.2%id,  0.3%wa,  0.1%hi,  0.7%si,  0.0%st
Mem:  49283936k total, 38816824k used, 10467112k free,   262040k buffers
Swap:  4194300k total,     4332k used,  4189968k free, 17681732k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
13646 apache    20   0  459m  39m 8712 R 38.8  0.1   0:22.52 httpd
12861 apache    20   0  459m  39m 8464 R 33.2  0.1   0:43.63 httpd
29545 apache    20   0  458m  38m 8704 S 30.3  0.1   0:05.25 httpd
24815 apache    20   0  459m  38m 8636 R 27.7  0.1   0:13.86 httpd
15329 apache    20   0  457m  37m 8544 S 23.1  0.1   0:06.12 httpd
26758 apache    20   0  440m  27m 5116 S 22.5  0.1   0:03.84 httpd
29183 apache    20   0  442m  29m 4300 S 17.9  0.1   0:03.89 httpd
 3580 apache    20   0  446m  34m 5800 S 16.9  0.1   0:14.58 httpd
14789 nagios    20   0  126m 5208 1956 R 15.6  0.0   0:00.57 process_perfdat
 6901 apache    20   0  456m  36m 8712 S 15.0  0.1   0:21.75 httpd
14783 nagios    20   0  125m 4784 1956 R 14.3  0.0   0:00.55 process_perfdat
30373 apache    20   0  451m  31m 8652 S 13.7  0.1   0:18.44 httpd
15326 apache    20   0  440m  27m 5320 S 12.4  0.1   0:00.38 httpd
28442 apache    20   0  458m  37m 8416 S 10.8  0.1   0:14.80 httpd
 1156 apache    20   0  456m  36m 8680 S 10.1  0.1   0:22.91 httpd
13931 apache    20   0  442m  29m 4284 S 10.1  0.1   0:00.78 httpd
I need help to perform a troubleshooting; More in depth; And find out what the problem is.
Is it possible to separate NagVis processing on a separate server?
Changing this wouldn't change NagVis or load at all, the only difference is NagVis was doing an arbatrary check of when NDO updated a time in a table and gave the error if over xxx seconds.

Re: NagVis - NDO claims that nagios did not status update

Posted: Thu Aug 31, 2017 1:18 pm
by ssoliveira
I understood, so this modification would not solve the problem; It would only make NagVis not generate alarm when communication with the system takes longer than normal.

This problem is becoming critical here in the company.

How can I investigate why the environment presents problems?

* Can this CPU consumption by Apache be the problem?
* Do I need to add more CPU?
* Do I need to add more memory?

Our infrastructure is monitoring few servers; but soon we will add a lot of servers.

We use the separate core of the database; and each server interacts with Gearman for load unloading.

What information do I need to report here? About mey environment; To help with this analysis?

The Core server has:

* 48GB of RAM
* 8 CPU
* Disk = LUNS in VMAX Storage (~ 10000 IOPS)

IO

Code: Select all

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     5.33    0.00    0.67     0.00    48.00    72.00     0.01    7.50    0.00    7.50   7.50   0.50
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.00    6.00     0.00    48.00     8.00     0.06   10.61    0.00   10.61   0.83   0.50
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-6              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-7              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-9              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    1.67     0.00     6.67     4.00     0.00    2.40    0.00    2.40   2.40   0.40
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdk               0.00     0.33   11.67  104.33   112.00   842.00     8.22     0.07    0.64    1.54    0.54   0.60   6.93
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.33     0.00     2.33     7.00     0.00    1.00    0.00    1.00   1.00   0.03
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdf               0.00     0.00    0.00    1.67     0.00    12.00     7.20     0.00    2.20    0.00    2.20   2.20   0.37
sdn               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdp               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdq               0.00     0.00   12.00  110.00   109.33   942.67     8.62     0.06    0.51    1.44    0.41   0.47   5.73
sdr               0.00     0.00    0.00    0.33     0.00     4.67    14.00     0.00    0.00    0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdo               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
VxVM23000         0.00     0.00   23.67  214.67   221.33  1784.67     8.42     0.14    0.59    1.52    0.48   0.50  11.83
sds               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdt               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdu               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdv               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
VxVM11000         0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
VxVM23001         0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
VxVM23002         0.00     0.00    0.00    0.67     0.00     7.00    10.50     0.00    0.50    0.00    0.50   0.50   0.03
VxVM23003         0.00     0.00    0.00    3.33     0.00    18.67     5.60     0.01    2.30    0.00    2.30   1.20   0.40
VxVM23004         0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00


Re: NagVis - NDO claims that nagios did not status update

Posted: Thu Aug 31, 2017 2:35 pm
by ssoliveira
We contacted Nagios Brasil; asking for help.

We were asked to disable one of the brokers, leaving only 1 running.

We were also given the procedure to enable the debug in the NDO, as below.

We are reviewing whether the issue continues after these changes.

==========================================

/usr/local/nagios/etc/ndo2db.cfg

debug_level=-1

==========================================

tail -f /usr/local/nagios/var/ndo2db.debug
tail -f /usr/local/nagios/var/nagios.log

==========================================

Re: NagVis - NDO claims that nagios did not status update

Posted: Thu Aug 31, 2017 4:01 pm
by cdienger
Thank you. Please keep us posted with your progress after making this change.