Hi Team,
I am getting the below error messages in my /var/log/messages and my nagios service stops abnormally due to this
Jul 21 23:59:29 localhost ndo2db: Message sent to queue.
Jul 21 23:59:29 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:30 localhost ndo2db: Message sent to queue.
Jul 21 23:59:30 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:31 localhost ndo2db: Message sent to queue.
Jul 21 23:59:31 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:32 localhost ndo2db: Message sent to queue.
Jul 21 23:59:32 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:33 localhost ndo2db: Message sent to queue.
Jul 21 23:59:33 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:34 localhost ndo2db: Message sent to queue.
Jul 21 23:59:34 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:59:35 localhost ndo2db: Message sent to queue.
Jul 21 23:59:35 localhost ndo2db: Warning: queue send error, retrying...
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost ndo2db: Error: queue recv error.
Jul 21 23:04:42 localhost rsyslogd-2177: imuxsock begins to drop messages from pid 53259 due to rate-limiting
sysctl -p
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
kernel.sysrq = 0
kernel.core_uses_pid = 1
net.ipv4.tcp_syncookies = 1
kernel.msgmnb = 262144000
kernel.msgmax = 262144000
kernel.shmmax = 7294967295
kernel.shmall = 768435456
kernel.msgmni = 512000
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.ip_forward = 0
kernel.exec-shield = 1
kernel.randomize_va_space = 1
I have had followed the below link but no luck, please help
https://support.nagios.com/kb/article.php?id=139
ndo2db: Error: queue recv error.
Re: ndo2db: Error: queue recv error.
Please run these commands:
Then validate if it's now working.
Then attach these files:
And send the full output of these commands:
Also, please include a screenshot of Admin > Performance Settings > Databases.
Thank you
Code: Select all
service npcd stop
service nagios stop
service ndo2db stop
if grep --quiet pgsql /usr/local/nagiosxi/html/config.inc.php; then service postgresql stop; fi;
service mysqld restart
rm -rf /usr/local/nagios/var/rw/nagios.cmd
rm -rf /usr/local/nagios/var/nagios.lock
rm -rf /usr/local/nagios/var/ndo.sock
rm -rf /usr/local/nagios/var/ndo2db.lock
rm -rf /usr/local/nagiosxi/var/reconfigure_nagios.lock
rm -rf /var/lib/mrtg/mrtg_l
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
pkill -9 -u nagios
if grep --quiet pgsql /usr/local/nagiosxi/html/config.inc.php; then service postgresql start; fi;
service httpd restart
service ndo2db start
service nagios start
service npcd start
service crond restartThen attach these files:
Code: Select all
/etc/init.d/nagiosCode: Select all
chage -l nagios
grep nag /etc/group
ipcs -q
tail -n50 /var/log/messages /var/log/mysqld.log /var/log/mariadb/mariadb.log
Thank you
Re: ndo2db: Error: queue recv error.
Hi SSAX,
Still that error is coming. Please find the /etc/init.d/nagios file and commands output in attachment.
Still that error is coming. Please find the /etc/init.d/nagios file and commands output in attachment.
You do not have the required permissions to view the files attached to this post.
Re: ndo2db: Error: queue recv error.
Please PM a copy of your profile, you can download it by going to Admin > System Config > System Profile and click the Download Profile button in the top right corner.
Did this just start occurring or did it come about from an upgrade or some other maintenance?
Did this just start occurring or did it come about from an upgrade or some other maintenance?
Re: ndo2db: Error: queue recv error.
Hi ssax,
I am not sure since when it started occurring, but we did upgraded to latest version of nagios xi. Please find below the attached profile.
I am not sure since when it started occurring, but we did upgraded to latest version of nagios xi. Please find below the attached profile.
You do not have the required permissions to view the files attached to this post.
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: ndo2db: Error: queue recv error.
You should increase the following:
Those are what we suggest in our kb article, but we have at least one customer that uses 10x the default (5x what you have). Please let us know if increasing them further does not resolve the issue.
Also, I note that you have about 20k Hosts + Services. This is about the place where you need to think seriously about getting a second XI box. Alternatively, you can decrease your check frequency.
Code: Select all
kernel.msgmnb = 262144000
kernel.msgmax = 262144000
Also, I note that you have about 20k Hosts + Services. This is about the place where you need to think seriously about getting a second XI box. Alternatively, you can decrease your check frequency.
Re: ndo2db: Error: queue recv error.
Hi dwhitfield,
kernel.msgmnb and kernel.msgmax are increased further but still getting same error and also CPU load is too high i.e. sometimes it goes above 200.
Also we have one Nagios server and 2 Workers.
Nagios Server:
bash-4.1# free -m
total used free shared buffers cached
Mem: 64416 36899 27517 389 59 1494
-/+ buffers/cache: 35344 29072
Swap: 61439 521 60918
Worker1
[root@SRVBAN19NGLBVM1 ~]# free -m
total used free shared buffers cached
Mem: 32101 30636 1465 0 255 4496
-/+ buffers/cache: 25884 6216
Swap: 16119 0 16119
Worker2
[root@SRVBAN19NGLBVM2 ~]# free -m
total used free shared buffers cached
Mem: 15942 13025 2917 0 14 98
-/+ buffers/cache: 12912 3030
Swap: 8039 8 8031
kernel.msgmnb and kernel.msgmax are increased further but still getting same error and also CPU load is too high i.e. sometimes it goes above 200.
Also we have one Nagios server and 2 Workers.
Nagios Server:
bash-4.1# free -m
total used free shared buffers cached
Mem: 64416 36899 27517 389 59 1494
-/+ buffers/cache: 35344 29072
Swap: 61439 521 60918
Worker1
[root@SRVBAN19NGLBVM1 ~]# free -m
total used free shared buffers cached
Mem: 32101 30636 1465 0 255 4496
-/+ buffers/cache: 25884 6216
Swap: 16119 0 16119
Worker2
[root@SRVBAN19NGLBVM2 ~]# free -m
total used free shared buffers cached
Mem: 15942 13025 2917 0 14 98
-/+ buffers/cache: 12912 3030
Swap: 8039 8 8031
-
dwhitfield
- Former Nagios Staff
- Posts: 4583
- Joined: Wed Sep 21, 2016 10:29 am
- Location: NoLo, Minneapolis, MN
- Contact:
Re: ndo2db: Error: queue recv error.
If your load is getting that high, you need to take a look at https://assets.nagios.com/downloads/nag ... ios-XI.pdf
RAM is clearly not your issue.
You could try decreasing the service check timeout: service_check_timeout=60
More checks will time out, but you'll get better performance over all.
Your host templates all have a default check interval of 5. You could try increasing that, as I mentioned yesterday. I also checked the services on a couple of hosts in your profile and they were all set to 5 minutes. You really need to go through and decide what doesn't need to be checked every 5 minutes, or you need to offload some of this onto a Core box or another XI box.
It's really helpful you use code blocks for output. For example:
RAM is clearly not your issue.
You could try decreasing the service check timeout: service_check_timeout=60
More checks will time out, but you'll get better performance over all.
Your host templates all have a default check interval of 5. You could try increasing that, as I mentioned yesterday. I also checked the services on a couple of hosts in your profile and they were all set to 5 minutes. You really need to go through and decide what doesn't need to be checked every 5 minutes, or you need to offload some of this onto a Core box or another XI box.
It's really helpful you use code blocks for output. For example:
Code: Select all
bash-4.1# free -m
total used free shared buffers cached
Mem: 64416 36899 27517 389 59 1494
-/+ buffers/cache: 35344 29072
Swap: 61439 521 60918