G 'Day Support,
I am having an issue with my Nagios XI installation...cannot seem to figure out what is going on. Recently implemented nagiosramdisk via your script and all seems to have been working well. I found it is a bad state this morning along with checkresults directory with hundreds of thousands of files. Obvious not processing working. I've tried everything I can think of and I cannot get the system stabilized without eventually cratering.
It starts out with the "Monitoring Engine Event Queue" basically not showing any activity and then this error shows up, "May 20 17:03:25 dcom-nagiosxi-p1 ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 512000 of 512000 messages and 524288000 of 524288000 bytes in the queue. See README for kernel tuning options."
I really need help in determining how to troubleshoot this issue...first time I have encountered...I increased the kernel parameters but that just prolongs the time until the next failure...then I have to restart ndo2db and we are back to square one...please advise and thanks...
Danny
ndo2db eventually fails
Re: ndo2db eventually fails
Well I took my latest backup and brought XI up on my DR instance and it is running as expected...so I figure the configuration itself is ok...trying to uninstall everything on my Primary server and then re-install. Concerned about how to remove the nagiosramdisk...or will the uninstall script perform that work as well??? Let me know and thanks...
Any suggestions?
Danny
Any suggestions?
Danny
Re: ndo2db eventually fails
What version of XI is this and what OS version is it installed on? It looks like the uninstall script will take care of it but I would still recommend reviewing the manual steps in https://assets.nagios.com/downloads/nag ... giosXI.pdf afterwards to make sure they all get removed.
When/if you reinstall the ramdisk, monitor the system closely at first to make sure files are not queuing up and if you do notice it start to queue up, get a profile from the system so we can review. It would also be good to gather any files that the ramdisk script touches - see the PDF for OS specific files.
When/if you reinstall the ramdisk, monitor the system closely at first to make sure files are not queuing up and if you do notice it start to queue up, get a profile from the system so we can review. It would also be good to gather any files that the ramdisk script touches - see the PDF for OS specific files.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: ndo2db eventually fails
@cdienger
So I started investigating this more deeply and determined that the kernel message queue was hitting its limits. I see that the mysql database was utilizing too much CPU to stay up with the amount of messages being sent. So I modified the ipcs kernel settings as recommended but that wasn't enough...I therefore enabled backendcache which appears to have help...I am still concerned that the system is overloaded...
Resources available:
# uname -a
Linux dcom-nagiosxi-p3.xxxx.xxxxxxxxxx.xxx 3.10.0-957.10.1.el7.x86_64 #1 SMP Thu Feb 7 07:12:53 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
# top
top - 10:53:34 up 28 days, 23:03, 2 users, load average: 3.94, 3.55,
Tasks: 439 total, 2 running, 436 sleeping, 0 stopped, 1 zombie
%Cpu(s): 11.9 us, 2.9 sy, 0.0 ni, 85.0 id, 0.0 wa, 0.0 hi, 0.2 si,
KiB Mem : 16266124 total, 2233284 free, 2126932 used, 11905908 buff/c
KiB Swap: 2097148 total, 1243928 free, 853220 used. 13410624 avail
# grep MemTotal /proc/meminfo
MemTotal: 16266124 kB
# grep processor /proc/cpuinfo | wc -l
8
# df -kh /var/nagiosramdisk/
Filesystem Size Used Avail Use% Mounted on
tmpfs 1000M 152M 849M 16% /var/nagiosramdisk
Does anyone have any suggested limitations on the number of monitoring objects that a single instance of Nagios XI can manage?
Let me know and thanks,
Danny
So I started investigating this more deeply and determined that the kernel message queue was hitting its limits. I see that the mysql database was utilizing too much CPU to stay up with the amount of messages being sent. So I modified the ipcs kernel settings as recommended but that wasn't enough...I therefore enabled backendcache which appears to have help...I am still concerned that the system is overloaded...
Resources available:
# uname -a
Linux dcom-nagiosxi-p3.xxxx.xxxxxxxxxx.xxx 3.10.0-957.10.1.el7.x86_64 #1 SMP Thu Feb 7 07:12:53 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
# top
top - 10:53:34 up 28 days, 23:03, 2 users, load average: 3.94, 3.55,
Tasks: 439 total, 2 running, 436 sleeping, 0 stopped, 1 zombie
%Cpu(s): 11.9 us, 2.9 sy, 0.0 ni, 85.0 id, 0.0 wa, 0.0 hi, 0.2 si,
KiB Mem : 16266124 total, 2233284 free, 2126932 used, 11905908 buff/c
KiB Swap: 2097148 total, 1243928 free, 853220 used. 13410624 avail
# grep MemTotal /proc/meminfo
MemTotal: 16266124 kB
# grep processor /proc/cpuinfo | wc -l
8
# df -kh /var/nagiosramdisk/
Filesystem Size Used Avail Use% Mounted on
tmpfs 1000M 152M 849M 16% /var/nagiosramdisk
Does anyone have any suggested limitations on the number of monitoring objects that a single instance of Nagios XI can manage?
Let me know and thanks,
Danny
You do not have the required permissions to view the files attached to this post.
Re: ndo2db eventually fails
30,000 total checks is usually a pretty good cut off but the number can vary widely depending on the frequency of checks, the types of checks, tweaks, etc...https://assets.nagios.com/downloads/nag ... ios-XI.pdf covers the tweaking possibilities. In my own testing, I found the change to the reaper settings and lowering the frequency of the checks to be pretty beneficial.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
Re: ndo2db eventually fails
@cdienger
Well I think I have finally tweaked it enough to keep it running. I ended up creating the /var/nagiosramdisk and /usr/local/nagiosxi/tmp/backendcache ramdisks...
Currently the system is operating as expect with regard to processing events...seems to be acceptable...you can lock this entry.
Thanks for your help/input,
Danny
Well I think I have finally tweaked it enough to keep it running. I ended up creating the /var/nagiosramdisk and /usr/local/nagiosxi/tmp/backendcache ramdisks...
Currently the system is operating as expect with regard to processing events...seems to be acceptable...you can lock this entry.
Thanks for your help/input,
Danny
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: ndo2db eventually fails
great!onegative wrote:@cdienger
Well I think I have finally tweaked it enough to keep it running. I ended up creating the /var/nagiosramdisk and /usr/local/nagiosxi/tmp/backendcache ramdisks...
Currently the system is operating as expect with regard to processing events...seems to be acceptable...you can lock this entry.
Thanks for your help/input,
Danny
Locking