Hello,
System: Centos 7.7, 8 cpu, 16gb, enough disk space
Total checks: 8000
I have been having Nagios XI stability issues and am trying to figure out what needs to be done or changed to get it more stable.
I have setup a max concurrent jobs to 60, I have repaired the databases. Tweaked parameters for a large setup (like reaper frequency/time). Setup to use unified tactical overview, increased the refresh multiplier by 10 times the default, disabled auto-running reports and metrics on page load. The ndo2db has been showing "max retries exceeded", so i setup the msgmni value.
Its still not stable.. i am also seeing the wproc related messages.
nagios[26891]: wproc: iocache_read() from Core Worker 26901 returned -1: Bad address
I see the monitoring engine stop every midnight. I have seen it stop randomly during the day.
I am not sure what my best approach is to get this stable.
Thank you,
Vinod
Nagios XI stability issues
Re: Nagios XI stability issues
Sorry to hear that you're having trouble!
Could we get a system profile from you?
Once we have that, we can start investigating.
Thanks!
Could we get a system profile from you?
Once we have that, we can start investigating.
Thanks!
You do not have the required permissions to view the files attached to this post.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
vappukuttan
- Posts: 52
- Joined: Tue Mar 05, 2019 7:43 am
Re: Nagios XI stability issues
Thank you for your response. Once I get the system profile, what do I do. Upload it to this thread. Is there anything in that file, that i need to remove like server ip-address or anything of that kind.
Here is the latest error i got a few minutes ago.. within /var/log/messages. I wasnt sure what README its referring to.
ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 256000 of 512000 messages and 262144000 of 262144000 bytes in the queue. See README for kernel tuning options.
Thank you,
Vinod
Here is the latest error i got a few minutes ago.. within /var/log/messages. I wasnt sure what README its referring to.
ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 256000 of 512000 messages and 262144000 of 262144000 bytes in the queue. See README for kernel tuning options.
Thank you,
Vinod
Re: Nagios XI stability issues
You could PM me your profile. There is a lot of IP address and other configuration info in the profile, so if your local security policy doesn't let you share that, you may not be able to get the profile for us after all.
However...
The error you mentioned could be related to this problem:
https://support.nagios.com/kb/article.php?id=139
Have a look at that document, try what it says, and see if that helps your issue.
Thanks!
--Jeffrey
However...
The error you mentioned could be related to this problem:
https://support.nagios.com/kb/article.php?id=139
Have a look at that document, try what it says, and see if that helps your issue.
Thanks!
--Jeffrey
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
vappukuttan
- Posts: 52
- Joined: Tue Mar 05, 2019 7:43 am
Re: Nagios XI stability issues
Thank you Jeffrey, I had made the kernel changes two days ago, and reverted back the
kernel.msgmnb and kernel.msgmax values to 131072000 (after Nagios stopped again)
The kernel.msgmni was set up to 512000
Let me increase the kernel.msgmnb and kernel.msgmax to 262144000 and see if it helps stabilize the ndo2db.
Sending the Profile may be an issue. Is there anything specific that i can grab from it and provide?
Thank you,
Vinod
kernel.msgmnb and kernel.msgmax values to 131072000 (after Nagios stopped again)
The kernel.msgmni was set up to 512000
Let me increase the kernel.msgmnb and kernel.msgmax to 262144000 and see if it helps stabilize the ndo2db.
Sending the Profile may be an issue. Is there anything specific that i can grab from it and provide?
Thank you,
Vinod
-
vappukuttan
- Posts: 52
- Joined: Tue Mar 05, 2019 7:43 am
Re: Nagios XI stability issues
It stopped again today.. even with the recommended values of msgmnb, msgmax, msgmni ..
Mar 20 17:15:23 ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 256000 of 512000 messages and 262144000 of 262144000 bytes in the queue. See README for kernel tuning options.
Not sure if i can increase the values any more.. or why the message queue is getting maxed out.
Thank you,
Vinod
Mar 20 17:15:23 ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 256000 of 512000 messages and 262144000 of 262144000 bytes in the queue. See README for kernel tuning options.
Not sure if i can increase the values any more.. or why the message queue is getting maxed out.
Thank you,
Vinod
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Nagios XI stability issues
Locking thread as it has moved to ticket #820717