Page 1 of 1
Nagios XI interface hangs upon login
Posted: Fri Apr 14, 2017 7:16 pm
by jxk
I'm running into a problem where I'm unable to log into Nagios XI. No timeout occurs, and according to the log, Nagios seemingly hums along fine until the following errors occur:
Code: Select all
Apr 14 18:41:10 HOST ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may need to be tuned. See README.
Apr 14 18:41:10 HOST ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 256000 of 512000 messages and 262144000 of 262144000 bytes in the queue. See README for kernel tuning options.
Apr 14 18:41:30 HOST ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may need to be tuned. See README.
Apr 14 18:41:30 HOST ndo2db: Warning: queue send error, retrying...
Apr 14 18:41:50 HOST ndo2db: Error: max retries exceeded sending message to queue. Kernel queue parameters may need to be tuned. See README.
Apr 14 18:41:50 HOST ndo2db: Warning: queue send error, retrying...
At this point, Nagios activity grinds to a halt. The kernel parameters msgmnb and msgmax were already doubled for troubleshooting purposes.
It's a pretty large deployment with over 3200 hosts and ~7500 checks. 16 core server with 32GB RAM.
The database, which is offloaded, appears to be fine.
Has anybody experienced this before? I'm wondering if the box has been outgrown even though none of the typical resource bottlenecks have reared their heads.
64bit RHEL 7.3
Manual XI install on VM
Re: Nagios XI interface hangs upon login
Posted: Fri Apr 14, 2017 11:16 pm
by jxk
I should add that despite the errors in the log, I'm still unable to advance past the login screen, even after restarting ndo2db, the web server, or rebooting.
Re: Nagios XI interface hangs upon login
Posted: Sun Apr 16, 2017 8:34 pm
by tacolover101
i haven't seen that error before, but it's probably worth following what the message mentions about the readme. see this -
https://sourceforge.net/p/nagios/ndouti ... ree/README
from what I can tell it's just increasing a variable on your system -
************************
TUNING KERNEL PARAMETERS
************************
NDOUTILS uses a single message queue to communicate between the broker
module and the NDO2DB daemon. Depending on the operating system, there
may be parameters that need to be tuned in order for this communication
to work correctly. The discussion below applies specifically to Linux,
but may apply generally to other Unices as well.
There are three Linux kernel parameters that determine the resources
provided to the messaging subsystem:
* kernel.msgmax is the maximum size of a single message in a
message queue
* kernel.msgmni is the maximum number of messages allowed in any
one message queue
* kernel.msgmnb is the total number of bytes allow in all messages
in any one message queue
To see the current values for any of these parameters, cat
/proc/sys/kernel/msg{max|mni|mnb}.
In order for NDOUTILS to work at all, kernel.msgmax must be greater than
the size of the queue_msg struct (currently 1026 bytes). Most Linux
distributions set kernel.msgmax to a default of 65536.
If there are insufficient resources for sending messages between the
broker and the daemon, you will see an entry similar to the following
in your logs. (This is logged via the syslog facility, using the level
LOG_ERR and the default facility.)
ndo2db: Warning: Retrying message send. This can occur because
you have too few messages allowed or too few total bytes
allowed in message queues. You are currently using 16 of 16
mesages and 65536 of 65536 bytes in the queue. See README for
kernel tuning options.
If you see this entry, the message will likely eventually be sent,
but retrying uses system resources, and there is the possibility that
more messages will queued than can be handled, causing the broker
module to stall.
If you are close to or have exceeded the number of messages, you may
need to increase kernel.msgmni. If you are close to or have exceeded
the number of bytes in the queue, you may need to increase
kernel.msgmnb. In some cases you may need to increase both.
A conservative approach would be to double the necessary value, stop
and restart both the NDO2DB daemon and Nagios Core, and watch for any
further messages. Note that if NDO2DB is started after Nagios Core,
you may see the warning above as the broker module first attempts to
flush its backlog of messages.
To increase a value, echo the value to /proc/sys/kernel/msgmni or
/proc/sys/kernel/msgmnb as appropriate.
For example, to increase the number of messages allowed in the queue
to 32, use the command 'echo 32 > /proc/sys/kernel/msgmni' (without
the quotes).
Once you have determine the correct parameters, you can make them
permanent by editing /etc/sysctl.conf. Add or update the line of
the form 'kernel.msg{mni|mnb} = <value>' with the value(s) determined
above. The next time the system is booted, the values of the
parameters in /etc/sysctl.conf will be loaded.
Re: Nagios XI interface hangs upon login
Posted: Mon Apr 17, 2017 9:05 am
by jxk
I've read this and heavily increased those parameters to no avail. Restarting ndo2db will clear the message queue, which is expected. I can then see the message queue growing, and it eventually hits the limit which spawns the error. At that point, Nagios stops processing checks.
What doesn't make sense is that no matter what the parameters are set to, I still can't get logged into the GUI. This system was working fine for months, and nothing was changed other than adding some additional monitoring hosts.
Re: Nagios XI interface hangs upon login
Posted: Mon Apr 17, 2017 9:58 am
by jxk
It turns out upgrading and downgrading the VM has resolved this issue. I suppose something in the virtual ether still had it's hooks into the kernel.
Re: Nagios XI interface hangs upon login
Posted: Mon Apr 17, 2017 10:54 am
by avandemore