NagVis - NDO claims that nagios did not status update

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
ssoliveira
Posts: 91
Joined: Wed Dec 07, 2016 6:02 pm

NagVis - NDO claims that nagios did not status update

Post by ssoliveira »

Nagvis NDO claims that nagios did not status update for more than 180 seconds

Often; NagVis is having problems; Requiring service to be restarted.
How can I investigate the reason?

Code: Select all

[root@st-dc3a-nagios-n01 ~]# ps -ef | grep ndo2db
nagios   27757     1  0 15:08 ?        00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios   28902 27757  0 15:08 ?        00:00:00 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
nagios   28907 28902 12 15:08 ?        00:00:02 /usr/local/nagios/bin/ndo2db -c /usr/local/nagios/etc/ndo2db.cfg
root     30027 13067  0 15:08 pts/0    00:00:00 grep ndo2db
Are there any logs I can analyze?

service ndo2db stop
service ndo2db start

Thank you
You do not have the required permissions to view the files attached to this post.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: NagVis - NDO claims that nagios did not status update

Post by scottwilkerson »

Searching our forums for this error did reveal this problem has been seen in the past, can you try the commands in this post

https://support.nagios.com/forum/viewto ... 317#p18314
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
ssoliveira
Posts: 91
Joined: Wed Dec 07, 2016 6:02 pm

Re: NagVis - NDO claims that nagios did not status update

Post by ssoliveira »

I already read this topic, and suggest restarting the services, and comments on a ntp time synchronization. That everything is ok.

My problem is that the error is occurring frequently, and I need to restart services.
I would like a way to analyze the problem, to try to identify a cause, in /var/log/messages there is nothing useful.
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: NagVis - NDO claims that nagios did not status update

Post by scottwilkerson »

What version of Nagios XI are you running?
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: NagVis - NDO claims that nagios did not status update

Post by scottwilkerson »

Here's a solution for the current XI version

Run the following from the CLI

Code: Select all

sed -i "s/maxtimewithoutupdate=180/maxtimewithoutupdate=86400/g" /usr/local/nagvis/etc/nagvis.ini.php
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
ssoliveira
Posts: 91
Joined: Wed Dec 07, 2016 6:02 pm

Re: NagVis - NDO claims that nagios did not status update

Post by ssoliveira »

What is the behavior after changing this parameter?

; maximum delay of the NDO Database in seconds
;maxtimewithoutupdate=180

I have verified high CPU utilization by Apache processes.

Code: Select all

Tasks: 529 total,   7 running, 521 sleeping,   0 stopped,   1 zombie
Cpu(s): 78.0%us, 15.8%sy,  0.0%ni,  5.2%id,  0.3%wa,  0.1%hi,  0.7%si,  0.0%st
Mem:  49283936k total, 38816824k used, 10467112k free,   262040k buffers
Swap:  4194300k total,     4332k used,  4189968k free, 17681732k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
13646 apache    20   0  459m  39m 8712 R 38.8  0.1   0:22.52 httpd
12861 apache    20   0  459m  39m 8464 R 33.2  0.1   0:43.63 httpd
29545 apache    20   0  458m  38m 8704 S 30.3  0.1   0:05.25 httpd
24815 apache    20   0  459m  38m 8636 R 27.7  0.1   0:13.86 httpd
15329 apache    20   0  457m  37m 8544 S 23.1  0.1   0:06.12 httpd
26758 apache    20   0  440m  27m 5116 S 22.5  0.1   0:03.84 httpd
29183 apache    20   0  442m  29m 4300 S 17.9  0.1   0:03.89 httpd
 3580 apache    20   0  446m  34m 5800 S 16.9  0.1   0:14.58 httpd
14789 nagios    20   0  126m 5208 1956 R 15.6  0.0   0:00.57 process_perfdat
 6901 apache    20   0  456m  36m 8712 S 15.0  0.1   0:21.75 httpd
14783 nagios    20   0  125m 4784 1956 R 14.3  0.0   0:00.55 process_perfdat
30373 apache    20   0  451m  31m 8652 S 13.7  0.1   0:18.44 httpd
15326 apache    20   0  440m  27m 5320 S 12.4  0.1   0:00.38 httpd
28442 apache    20   0  458m  37m 8416 S 10.8  0.1   0:14.80 httpd
 1156 apache    20   0  456m  36m 8680 S 10.1  0.1   0:22.91 httpd
13931 apache    20   0  442m  29m 4284 S 10.1  0.1   0:00.78 httpd
I need help to perform a troubleshooting; More in depth; And find out what the problem is.
Is it possible to separate NagVis processing on a separate server?
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: NagVis - NDO claims that nagios did not status update

Post by scottwilkerson »

ssoliveira wrote:What is the behavior after changing this parameter?

; maximum delay of the NDO Database in seconds
;maxtimewithoutupdate=180

I have verified high CPU utilization by Apache processes.

Code: Select all

Tasks: 529 total,   7 running, 521 sleeping,   0 stopped,   1 zombie
Cpu(s): 78.0%us, 15.8%sy,  0.0%ni,  5.2%id,  0.3%wa,  0.1%hi,  0.7%si,  0.0%st
Mem:  49283936k total, 38816824k used, 10467112k free,   262040k buffers
Swap:  4194300k total,     4332k used,  4189968k free, 17681732k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
13646 apache    20   0  459m  39m 8712 R 38.8  0.1   0:22.52 httpd
12861 apache    20   0  459m  39m 8464 R 33.2  0.1   0:43.63 httpd
29545 apache    20   0  458m  38m 8704 S 30.3  0.1   0:05.25 httpd
24815 apache    20   0  459m  38m 8636 R 27.7  0.1   0:13.86 httpd
15329 apache    20   0  457m  37m 8544 S 23.1  0.1   0:06.12 httpd
26758 apache    20   0  440m  27m 5116 S 22.5  0.1   0:03.84 httpd
29183 apache    20   0  442m  29m 4300 S 17.9  0.1   0:03.89 httpd
 3580 apache    20   0  446m  34m 5800 S 16.9  0.1   0:14.58 httpd
14789 nagios    20   0  126m 5208 1956 R 15.6  0.0   0:00.57 process_perfdat
 6901 apache    20   0  456m  36m 8712 S 15.0  0.1   0:21.75 httpd
14783 nagios    20   0  125m 4784 1956 R 14.3  0.0   0:00.55 process_perfdat
30373 apache    20   0  451m  31m 8652 S 13.7  0.1   0:18.44 httpd
15326 apache    20   0  440m  27m 5320 S 12.4  0.1   0:00.38 httpd
28442 apache    20   0  458m  37m 8416 S 10.8  0.1   0:14.80 httpd
 1156 apache    20   0  456m  36m 8680 S 10.1  0.1   0:22.91 httpd
13931 apache    20   0  442m  29m 4284 S 10.1  0.1   0:00.78 httpd
I need help to perform a troubleshooting; More in depth; And find out what the problem is.
Is it possible to separate NagVis processing on a separate server?
Changing this wouldn't change NagVis or load at all, the only difference is NagVis was doing an arbatrary check of when NDO updated a time in a table and gave the error if over xxx seconds.
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
ssoliveira
Posts: 91
Joined: Wed Dec 07, 2016 6:02 pm

Re: NagVis - NDO claims that nagios did not status update

Post by ssoliveira »

I understood, so this modification would not solve the problem; It would only make NagVis not generate alarm when communication with the system takes longer than normal.

This problem is becoming critical here in the company.

How can I investigate why the environment presents problems?

* Can this CPU consumption by Apache be the problem?
* Do I need to add more CPU?
* Do I need to add more memory?

Our infrastructure is monitoring few servers; but soon we will add a lot of servers.

We use the separate core of the database; and each server interacts with Gearman for load unloading.

What information do I need to report here? About mey environment; To help with this analysis?

The Core server has:

* 48GB of RAM
* 8 CPU
* Disk = LUNS in VMAX Storage (~ 10000 IOPS)

IO

Code: Select all

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sda               0.00     5.33    0.00    0.67     0.00    48.00    72.00     0.01    7.50    0.00    7.50   7.50   0.50
dm-0              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.00    6.00     0.00    48.00     8.00     0.06   10.61    0.00   10.61   0.83   0.50
dm-4              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-5              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-6              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-7              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-9              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdd               0.00     0.00    0.00    1.67     0.00     6.67     4.00     0.00    2.40    0.00    2.40   2.40   0.40
sdh               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdi               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdk               0.00     0.33   11.67  104.33   112.00   842.00     8.22     0.07    0.64    1.54    0.54   0.60   6.93
sdg               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdl               0.00     0.00    0.00    0.33     0.00     2.33     7.00     0.00    1.00    0.00    1.00   1.00   0.03
sdj               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sde               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdf               0.00     0.00    0.00    1.67     0.00    12.00     7.20     0.00    2.20    0.00    2.20   2.20   0.37
sdn               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdp               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdq               0.00     0.00   12.00  110.00   109.33   942.67     8.62     0.06    0.51    1.44    0.41   0.47   5.73
sdr               0.00     0.00    0.00    0.33     0.00     4.67    14.00     0.00    0.00    0.00    0.00   0.00   0.00
sdm               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdo               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
VxVM23000         0.00     0.00   23.67  214.67   221.33  1784.67     8.42     0.14    0.59    1.52    0.48   0.50  11.83
sds               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdt               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdu               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdv               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
VxVM11000         0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
VxVM23001         0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
VxVM23002         0.00     0.00    0.00    0.67     0.00     7.00    10.50     0.00    0.50    0.00    0.50   0.50   0.03
VxVM23003         0.00     0.00    0.00    3.33     0.00    18.67     5.60     0.01    2.30    0.00    2.30   1.20   0.40
VxVM23004         0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

You do not have the required permissions to view the files attached to this post.
ssoliveira
Posts: 91
Joined: Wed Dec 07, 2016 6:02 pm

Re: NagVis - NDO claims that nagios did not status update

Post by ssoliveira »

We contacted Nagios Brasil; asking for help.

We were asked to disable one of the brokers, leaving only 1 running.

We were also given the procedure to enable the debug in the NDO, as below.

We are reviewing whether the issue continues after these changes.

==========================================

/usr/local/nagios/etc/ndo2db.cfg

debug_level=-1

==========================================

tail -f /usr/local/nagios/var/ndo2db.debug
tail -f /usr/local/nagios/var/nagios.log

==========================================
You do not have the required permissions to view the files attached to this post.
User avatar
cdienger
Support Tech
Posts: 5045
Joined: Tue Feb 07, 2017 11:26 am

Re: NagVis - NDO claims that nagios did not status update

Post by cdienger »

Thank you. Please keep us posted with your progress after making this change.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Locked