Nagios XI stability issues

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
vappukuttan
Posts: 52
Joined: Tue Mar 05, 2019 7:43 am

Nagios XI stability issues

Post by vappukuttan »

Hello,

System: Centos 7.7, 8 cpu, 16gb, enough disk space
Total checks: 8000

I have been having Nagios XI stability issues and am trying to figure out what needs to be done or changed to get it more stable.

I have setup a max concurrent jobs to 60, I have repaired the databases. Tweaked parameters for a large setup (like reaper frequency/time). Setup to use unified tactical overview, increased the refresh multiplier by 10 times the default, disabled auto-running reports and metrics on page load. The ndo2db has been showing "max retries exceeded", so i setup the msgmni value.

Its still not stable.. i am also seeing the wproc related messages.
nagios[26891]: wproc: iocache_read() from Core Worker 26901 returned -1: Bad address

I see the monitoring engine stop every midnight. I have seen it stop randomly during the day.

I am not sure what my best approach is to get this stable.

Thank you,
Vinod
User avatar
jdunitz
Posts: 235
Joined: Wed Feb 05, 2020 2:50 pm

Re: Nagios XI stability issues

Post by jdunitz »

Sorry to hear that you're having trouble!

Could we get a system profile from you?
systemProfile.png

Once we have that, we can start investigating.


Thanks!
You do not have the required permissions to view the files attached to this post.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
vappukuttan
Posts: 52
Joined: Tue Mar 05, 2019 7:43 am

Re: Nagios XI stability issues

Post by vappukuttan »

Thank you for your response. Once I get the system profile, what do I do. Upload it to this thread. Is there anything in that file, that i need to remove like server ip-address or anything of that kind.

Here is the latest error i got a few minutes ago.. within /var/log/messages. I wasnt sure what README its referring to.

ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 256000 of 512000 messages and 262144000 of 262144000 bytes in the queue. See README for kernel tuning options.


Thank you,
Vinod
User avatar
jdunitz
Posts: 235
Joined: Wed Feb 05, 2020 2:50 pm

Re: Nagios XI stability issues

Post by jdunitz »

You could PM me your profile. There is a lot of IP address and other configuration info in the profile, so if your local security policy doesn't let you share that, you may not be able to get the profile for us after all.

However...

The error you mentioned could be related to this problem:
https://support.nagios.com/kb/article.php?id=139

Have a look at that document, try what it says, and see if that helps your issue.

Thanks!

--Jeffrey
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.

Be sure to check out our Knowledgebase for helpful articles and solutions!
vappukuttan
Posts: 52
Joined: Tue Mar 05, 2019 7:43 am

Re: Nagios XI stability issues

Post by vappukuttan »

Thank you Jeffrey, I had made the kernel changes two days ago, and reverted back the
kernel.msgmnb and kernel.msgmax values to 131072000 (after Nagios stopped again)
The kernel.msgmni was set up to 512000

Let me increase the kernel.msgmnb and kernel.msgmax to 262144000 and see if it helps stabilize the ndo2db.

Sending the Profile may be an issue. Is there anything specific that i can grab from it and provide?

Thank you,
Vinod
vappukuttan
Posts: 52
Joined: Tue Mar 05, 2019 7:43 am

Re: Nagios XI stability issues

Post by vappukuttan »

It stopped again today.. even with the recommended values of msgmnb, msgmax, msgmni ..

Mar 20 17:15:23 ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 256000 of 512000 messages and 262144000 of 262144000 bytes in the queue. See README for kernel tuning options.

Not sure if i can increase the values any more.. or why the message queue is getting maxed out.

Thank you,
Vinod
scottwilkerson
DevOps Engineer
Posts: 19396
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises
Contact:

Re: Nagios XI stability issues

Post by scottwilkerson »

Locking thread as it has moved to ticket #820717
Former Nagios employee
Creator:
Human Design Website
Get Your Human Design Chart
Locked