Nagios Support Forum

Posted: **Tue Jun 06, 2017 1:18 pm**

The monitoring engine was in a hung state and after restarting gearmand,worker service and Nagios service the monitoring engine is still not working as expected.
We can see that the monitoring engine event queue is not updating in a timely manner. Please see the attachment.

We suspect because of this Nagvis is showing error as per the attachment.
Need immediate assistance. We can have a webex session to troubleshoot this issue further.

Posted: **Tue Jun 06, 2017 2:05 pm**

It looks like the ndo2db backend is not running on the server.
Lets stop the nagios daemon, start the backend and then restart nagios by running the following as root.

Code: Select all

service nagios stop
killall -9 nagios
service ndo2db start
service nagios start

If you receive any errors on the above, post them here.

Posted: **Tue Jun 06, 2017 2:22 pm**

# killall -9 nagios
nagios: no process killed

# service ndo2db stop
Stopping ndo2db: head: cannot open `/usr/local/nagios/var/ndo2db.lock' for reading: No such file or directory
done.

and then started ndo2db service and then nagios services again

Posted: **Tue Jun 06, 2017 2:55 pm**

Did everything go back to normal?

Posted: **Tue Jun 06, 2017 3:19 pm**

It was working for some time, but again we are seeing same issue. PFA

Posted: **Tue Jun 06, 2017 3:23 pm**

What is the output of:

Code: Select all

# service ndo2db status
# service nagios status
# service gearmand status

Posted: **Wed Jun 07, 2017 7:01 am**

# service ndo2db status
ndo2db (pid 2640) is running...
root@nagmonus1:(06-07 07:40): /root
# service nagios status
nagios (pid 29289) is running...
root@nagmonus1:(06-07 07:40): /root
# service gearmand status
gearmand (pid 13359) is running...

Let me know what logs you need for further troubleshooting. This is really affecting our ability to monitor our environment.
Please handle this case as high priority and let us know the next steps.

Posted: **Wed Jun 07, 2017 8:02 am**

Let me know if we can have a quick webex/screen share session fir further troubleshooting

Posted: **Wed Jun 07, 2017 9:09 am**

as per nagios.log we observed below error

[1496844505] ndomod: Successfully reconnected to data sink! 2759 items lost, 5000 queued items to flush.
[1496844505] ndomod: Error writing to data sink! Some output may get lost. 4851 queued items to flush.

Posted: **Wed Jun 07, 2017 9:25 am**

XI > Admin > System Profile > Download Profile

Please include the zip file in your response. You can PM myself or other support personnel the profile.

Remotes are done if our support personnel deem it necessary and request it. If you want priority support, our phone support options allows you to call in at any point and jump the queue. Remotes are usually initiated in such a setting as it's generally needed for fast resolution. If you already have phone support, you can find that contact information here:

https://www.nagios.com/contact/

Nagios Support Forum

Monitoring engine and Nagvis not working as expected

Monitoring engine and Nagvis not working as expected

Re: Monitoring engine and Nagvis not working as expected

Re: Monitoring engine and Nagvis not working as expected

Re: Monitoring engine and Nagvis not working as expected

Re: Monitoring engine and Nagvis not working as expected

Re: Monitoring engine and Nagvis not working as expected

Re: Monitoring engine and Nagvis not working as expected

Re: Monitoring engine and Nagvis not working as expected

Re: Monitoring engine and Nagvis not working as expected

Re: Monitoring engine and Nagvis not working as expected