Mar 12 11:00:01 nagiosprodxi1 ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 128000 of 23815 messages and 131072000 of 131072000 bytes in the queue. See README for kernel tuning options.
Mar 12 11:00:08 nagiosprodxi1 ndo2db: Message sent to queue.
Mar 12 11:00:08 nagiosprodxi1 ndo2db: Warning: queue send error, retrying...
Mar 12 11:00:09 nagiosprodxi1 ndo2db: Message sent to queue.
Mar 12 11:00:09 nagiosprodxi1 ndo2db: Warning: queue send error, retrying...
Mar 12 11:00:10 nagiosprodxi1 ndo2db: Message sent to queue.
Mar 12 11:00:10 nagiosprodxi1 ndo2db: Warning: queue send error, retrying...
Mar 12 11:00:11 nagiosprodxi1 ndo2db: Message sent to queue.
------
Mar 12 11:04:07 nagiosprodxi1 automount[2273]: key "<" not found in map source(s).
Mar 12 11:04:12 nagiosprodxi1 automount[2273]: create_client: hostname lookup failed: Name or service not known
Mar 12 11:04:12 nagiosprodxi1 automount[2273]: create_client: hostname lookup failed: Name or service not known
Mar 12 11:04:12 nagiosprodxi1 automount[2273]: get_exports: lookup(hosts): exports lookup failed for <
Mar 12 11:04:12 nagiosprodxi1 automount[2273]: key "<" not found in map source(s).
Mar 12 11:09:08 nagiosprodxi1 automount[2273]: key "<" not found in map source(s).
Mar 12 11:09:08 nagiosprodxi1 automount[2273]: create_client: hostname lookup failed: Name or service not known
Mar 12 11:09:08 nagiosprodxi1 automount[2273]: create_client: hostname lookup failed: Name or service not known
Mar 12 11:09:08 nagiosprodxi1 automount[2273]: get_exports: lookup(hosts): exports lookup failed for <
Mar 12 11:09:08 nagiosprodxi1 automount[2273]: key "<" not found in map source(s).
Mar 12 11:14:07 nagiosprodxi1 automount[2273]: key "<" not found in map source(s).
[nagios@nagiosprodxi1 local]$ cat /etc/sysctl.conf
# Kernel sysctl configuration file for Red Hat Linux
#
# For binary values, 0 is disabled, 1 is enabled. See sysctl(8) and
# sysctl.conf(5) for more details.
# Controls IP packet forwarding
net.ipv4.ip_forward = 0
# Controls source route verification
net.ipv4.conf.default.rp_filter = 1
# Do not accept source routing
net.ipv4.conf.default.accept_source_route = 0
# Controls the System Request debugging functionality of the kernel
kernel.sysrq = 0
# Controls whether core dumps will append the PID to the core filename.
# Useful for debugging multi-threaded applications.
kernel.core_uses_pid = 1
# Controls the use of TCP syncookies
net.ipv4.tcp_syncookies = 1
# Disable netfilter on bridges.
net.bridge.bridge-nf-call-ip6tables = 0
net.bridge.bridge-nf-call-iptables = 0
net.bridge.bridge-nf-call-arptables = 0
# Controls the default maxmimum size of a mesage queue
kernel.msgmnb = 131072000
# Controls the maximum size of a message, in bytes
kernel.msgmax = 131072000
# Controls the maximum shared segment size, in bytes
kernel.shmmax = 4294967295
# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 268435456
It looks like you might be having hostname resolution issues with your DNS. Other services are reporting hostname lookup failed:
Mar 12 11:04:12 nagiosprodxi1 automount[2273]: create_client: hostname lookup failed: Name or service not known
As for the queue size, it looks like you're good on amount of messages (23.8k of 128k) and with the setup you have (if this is the same server with 12cpu cores, etc) then you should be able to up your kernel.msgmnb and kernel.msgmax values in the sysctl.conf file and then run the sysctl -p command to update them. That should help with the ndoutils queue maxing out.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
jomann wrote:It looks like you might be having hostname resolution issues with your DNS. Other services are reporting hostname lookup failed:
Mar 12 11:04:12 nagiosprodxi1 automount[2273]: create_client: hostname lookup failed: Name or service not known
As for the queue size, it looks like you're good on amount of messages (23.8k of 128k) and with the setup you have (if this is the same server with 12cpu cores, etc) then you should be able to up your kernel.msgmnb and kernel.msgmax values in the sysctl.conf file and then run the sysctl -p command to update them. That should help with the ndoutils queue maxing out.
You are currently using 128000 of 23815 messages.
Isn't this statement saying the system is using 128k of 23k messages?
That's going to be a tough one, check_icmp is usually very reliable. I think the first troubleshooting step will be to see it run from the command line on the gearman server and analyze the input you're giving it and the output. It would also be useful to see a ping to the host in question right after it fails:
jdalrymple wrote:That's going to be a tough one, check_icmp is usually very reliable. I think the first troubleshooting step will be to see it run from the command line on the gearman server and analyze the input you're giving it and the output. It would also be useful to see a ping to the host in question right after it fails:
Sounds like it's a very infrequent problem that occurs? That is going to make it even tougher to sort out.
Any chance you could find one of the false alerts in your nagios.log and share the contents exactly? Like I mentioned check_icmp is pretty solid so I doubt the actual plugin is where the problem lies. I'm wondering if there is some useful output coming back.