Page 1 of 2

Monitoring engine and Nagvis not working as expected

Posted: Tue Jun 06, 2017 1:18 pm
by bosecorp
The monitoring engine was in a hung state and after restarting gearmand,worker service and Nagios service the monitoring engine is still not working as expected.
We can see that the monitoring engine event queue is not updating in a timely manner. Please see the attachment.

We suspect because of this Nagvis is showing error as per the attachment.
Need immediate assistance. We can have a webex session to troubleshoot this issue further.

Re: Monitoring engine and Nagvis not working as expected

Posted: Tue Jun 06, 2017 2:05 pm
by tgriep
It looks like the ndo2db backend is not running on the server.
Lets stop the nagios daemon, start the backend and then restart nagios by running the following as root.

Code: Select all

service nagios stop
killall -9 nagios
service ndo2db start
service nagios start
If you receive any errors on the above, post them here.

Re: Monitoring engine and Nagvis not working as expected

Posted: Tue Jun 06, 2017 2:22 pm
by bosecorp
# killall -9 nagios
nagios: no process killed

# service ndo2db stop
Stopping ndo2db: head: cannot open `/usr/local/nagios/var/ndo2db.lock' for reading: No such file or directory
done.

and then started ndo2db service and then nagios services again

Re: Monitoring engine and Nagvis not working as expected

Posted: Tue Jun 06, 2017 2:55 pm
by tgriep
Did everything go back to normal?

Re: Monitoring engine and Nagvis not working as expected

Posted: Tue Jun 06, 2017 3:19 pm
by bosecorp
It was working for some time, but again we are seeing same issue. PFA

Re: Monitoring engine and Nagvis not working as expected

Posted: Tue Jun 06, 2017 3:23 pm
by avandemore
What is the output of:

Code: Select all

# service ndo2db status
# service nagios status
# service gearmand status

Re: Monitoring engine and Nagvis not working as expected

Posted: Wed Jun 07, 2017 7:01 am
by bosecorp
# service ndo2db status
ndo2db (pid 2640) is running...
root@nagmonus1:(06-07 07:40): /root
# service nagios status
nagios (pid 29289) is running...
root@nagmonus1:(06-07 07:40): /root
# service gearmand status
gearmand (pid 13359) is running...

Let me know what logs you need for further troubleshooting. This is really affecting our ability to monitor our environment.
Please handle this case as high priority and let us know the next steps.

Re: Monitoring engine and Nagvis not working as expected

Posted: Wed Jun 07, 2017 8:02 am
by bosecorp
Let me know if we can have a quick webex/screen share session fir further troubleshooting

Re: Monitoring engine and Nagvis not working as expected

Posted: Wed Jun 07, 2017 9:09 am
by bosecorp
as per nagios.log we observed below error

[1496844505] ndomod: Successfully reconnected to data sink! 2759 items lost, 5000 queued items to flush.
[1496844505] ndomod: Error writing to data sink! Some output may get lost. 4851 queued items to flush.

Re: Monitoring engine and Nagvis not working as expected

Posted: Wed Jun 07, 2017 9:25 am
by avandemore
XI > Admin > System Profile > Download Profile

Please include the zip file in your response. You can PM myself or other support personnel the profile.

Remotes are done if our support personnel deem it necessary and request it. If you want priority support, our phone support options allows you to call in at any point and jump the queue. Remotes are usually initiated in such a setting as it's generally needed for fast resolution. If you already have phone support, you can find that contact information here:

https://www.nagios.com/contact/