Nagvis not working

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Nagvis not working

Post by bosecorp »

Nagvis is not working as expected, Problem (Backend:ndomy_1): NDO Claims that nagios did no status update ... Make sure that nagios and NDO daemons are running.'
Need immediate assistance, below are the status of all services.

# service --status-all
abrt-ccpp hook is installed
abrtd (pid 3343) is running...
abrt-dump-oops (pid 3351) is running...
acpid (pid 3045) is running...
ajaxterm (pid 3513) is running...
atd (pid 3421) is running...
auditd (pid 2789) is running...
automount (pid 3161) is running...
avahi-daemon (pid 2984) is running...
Stopped
cgred is stopped
cpuspeed is stopped
crond (pid 30251) is running...
cupsd (pid 3000) is running...
Service EracentEPAService is running (PID: 17976)
Service EracentEUAService is running (PID: 18655)
firstboot is not scheduled to run
gearmand (pid 3889) is running...
hald (pid 3054) is running...
htcacheclean is stopped
httpd (pid 20052) is running...
ip6tables: Firewall is not running.
IPsec stopped
Table: filter
Chain INPUT (policy ACCEPT)
num target prot opt source destination
1 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:5667
2 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:5667
3 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:5666
4 ACCEPT udp -- 0.0.0.0/0 0.0.0.0/0 state NEW udp dpt:162
5 ACCEPT tcp -- 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:5666

Chain FORWARD (policy ACCEPT)
num target prot opt source destination

Chain OUTPUT (policy ACCEPT)
num target prot opt source destination

irqbalance (pid 2845) is running...
iscsi is stopped
iscsid is stopped
Kdump is not operational
started
lvmetad is stopped
Checking for mcelog
mcelog (pid 3180) is running...
mdmonitor (pid 2894) is running...
messagebus (pid 2973) is running...
mod_gearman2_worker is running with pid 10367
multipathd is stopped
mysqld (pid 6394) is running...
nagios (pid 11141) is running...

Status of Nagios on: Mon May 15 14:09:46 EDT 2017

Status of Gearman Worker on nagmonus2
+ ssh nagmonus2 -n 'service mod_gearman_worker status'
mod_gearman_worker: unrecognized service
+ ssh gearmandce1 -n 'service mod_gearman_worker status'

Status of Gearman Worker on gearmandce1
mod_gearman_worker: unrecognized service

Status of Gearman Worker on gearmandcn1
+ ssh gearmandcn1 -n 'service mod_gearman_worker status'
mod_gearman_worker: unrecognized service

Status of MySql and PostGres Daemons on nagmondb1
+ ssh nagmondb1 -n 'service mysqld status'
mysqld (pid 673) is running...
+ ssh nagmondb1 -n 'service postgresql status'
postmaster is stopped

Status of ndo2db Daemon On nagmonus1
+ service ndo2db status
ndo2db (pid 3878) is running...

Status of nagios Daemon On nagmonus1
+ service nagios status
nagios (pid 11141) is running...
+ service nagiosxi status

Status of nagiosxi Daemon On nagmonus1

Status of npcd Daemon On nagmonus1
+ service npcd status
NPCD running (pid 4138).
+ service gearmand status

Status of gearmand Daemon On nagmonus1
gearmand (pid 3889) is running...

Status of mod_gearman_worker Daemon On nagmonus1
+ service mod_gearman_worker status
mod_gearman_worker: unrecognized service
+ service httpd status

Status of httpd Daemon On nagmonus1
httpd (pid 20052) is running...
+ service postgresql status

Status of postgresql Daemon On nagmonus1
postmaster (pid 30287) is running...

Status of mysqld Daemon On nagmonus1
+ service mysqld status
mysqld (pid 6394) is running...

Log is: /var/log/nagios/nagios_start_stop
nagios (pid 11141) is running...
ndo2db (pid 3878) is running...
No open transaction
netconsole module not loaded
Active NFS mountpoints:
/ubstmp
/home/el872784
Configured devices:
lo eth0
Currently active devices:
lo eth0
+--o nsrexecd (3432)
rpc.svcgssd is stopped
rpc.mountd is stopped
nfsd is stopped
rpc.rquotad is stopped
rpc.statd (pid 6846) is running...
nmbd is stopped
NPCD running (pid 4138).
ntpd (pid 26878) is running...
portreserve is stopped
master (pid 3319) is running...
postmaster (pid 30287) is running...
Process accounting is disabled.
quota_nld is stopped
rdisc is stopped
rhnsd (pid 3455) is running...
rngd is stopped
rpcbind (pid 5947) is running...
rpc.gssd is stopped
rpc.idmapd is stopped
rpc.svcgssd is stopped
rsyslogd (pid 2814) is running...
sandbox is stopped
saslauthd is stopped
sfcb is not running, but pid file exists
smartd is stopped
smbd is stopped
snmpd (pid 3195) is running...
snmptrapd is stopped
snmptt (pid 20381) is running...
openssh-daemon (pid 3207) is running...
svnserve is stopped
CIM server (3466) is runningvsftpd is stopped
winbindd is stopped
xinetd (pid 13914) is running...
xrdp (pid 3237) is running...
xrdp-sesman (pid 3241) is running...
ypbind (pid 5118) is running...
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagvis not working

Post by tgriep »

First, lets repair the mysql database and restart some of the processes to see if that fixes the issue.

Login to the Nagios server as root, run the following to repair the database.

Code: Select all

cd /usr/local/nagiosxi/scripts
./repair_databases.sh
Then run the following to restart the processes.

Code: Select all

service nagios stop
service ndo2db stop
service httpd restart
service ndo2db start
service nagios start
Login to the GUI and see if the error has gone away.
Be sure to check out our Knowledgebase for helpful articles and solutions!
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Re: Nagvis not working

Post by bosecorp »

Now Nagvis is working fine, looks like if the lag to get the status update is greater than 180 sec, then Nagvis reports this error.

How can i make sure that this issue do not repeat again.Typically when the database is corrupt the backups wont run, but thats not the case.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagvis not working

Post by tgriep »

Without knowing the what and how the database corruption, it is hard to detect the issue.
It could of been a stuck process that could or caused it as well and restarting them fixed it.
Be sure to check out our Knowledgebase for helpful articles and solutions!
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Re: Nagvis not working

Post by bosecorp »

I really doubt its the database, to me looks like there may be a lag which is causing this issue.
As the error clearly mentions "NDO claims that Nagios did no status Update for more than "180" seconds" I can upload the relevant logs so that you can further troubleshoot this issue.

Let me know what information you need from my end.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagvis not working

Post by tgriep »

A System Profile would be a good start as well as the output of this command.

Code: Select all

ipcs -q
Be sure to check out our Knowledgebase for helpful articles and solutions!
bosecorp
Posts: 929
Joined: Thu Jun 26, 2014 1:00 pm

Re: Nagvis not working

Post by bosecorp »

# ipcs -q

------ Message Queues --------
key msqid owner perms used-bytes messages
0xab000002 19529728 nagios 600 0 0
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagvis not working

Post by tgriep »

The Kernel Message queue looks good. Can you PM me a System Profile so I can check the servers settings?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked