ndo2db & rsyslogd ERRORS
ndo2db & rsyslogd ERRORS
Nagios environment has been less than stable lately, this morning Nagios was hung and the following were the prevailing errors in /var/log/messages:
Jul 7 08:52:56 usalfd0nagxi01 ndo2db: Error: queue recv error.
Jul 7 08:52:56 usalfd0nagxi01 rsyslogd-2177: imuxsock begins to drop messages from pid 27855 due to rate-limiting
Had to perform a server reboot to recover, but we are still seeing these errors and fear another outage.
Jul 7 08:52:56 usalfd0nagxi01 ndo2db: Error: queue recv error.
Jul 7 08:52:56 usalfd0nagxi01 rsyslogd-2177: imuxsock begins to drop messages from pid 27855 due to rate-limiting
Had to perform a server reboot to recover, but we are still seeing these errors and fear another outage.
Re: ndo2db & rsyslogd ERRORS
Can you PM over a profile for us to look at? (Admin -> System Profile -> Download Profile)
Additionally, how many CPU's do you have allocated to this machine?
Additionally, how many CPU's do you have allocated to this machine?
Former Nagios Employee
Re: ndo2db & rsyslogd ERRORS
INRE to uploading the profile.zip "The file is too big, maximum allowed size is 1 MiB"
I've checked, and the my profile.zip is 1.30MB
I've checked, and the my profile.zip is 1.30MB
Re: ndo2db & rsyslogd ERRORS
Ah, are you authorized to email into [email protected]? Otherwise, try splitting it into two and PM'ing it over.
EDIT: Profile received.
EDIT: Profile received.
Former Nagios Employee
Re: ndo2db & rsyslogd ERRORS
Looks like you have quite a bit of checks going, this could be why we're seeing the issues. What is the output of these commands?
To add to that, it looks like SQL is / was crashed -
Try running /usr/local/nagiosxi/scripts/repair_databases.sh and let us know how that goes.
Code: Select all
ipcs -q
df -i
ulimit -a
Code: Select all
160707 10:01:33 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
160707 10:01:33 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
160707 10:01:35 [Note] /usr/libexec/mysqld: Normal shutdown
Former Nagios Employee
Re: ndo2db & rsyslogd ERRORS
INRE to mysql, yes, after reboot, logentries table was broken, ran the repair script just after 10AM to fix (not sure which came first, chicken or the egg)
Code: Select all
[root@usalfd0nagxi01 local]# ipcs -q
------ Message Queues --------
key msqid owner perms used-bytes messages
0xd5000002 32768 nagios 600 0 0
[root@usalfd0nagxi01 local]# df -i
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/vg_usalfd0nagxi0-lv_root
128000 17985 110015 15% /
tmpfs 16503809 1 16503808 1% /dev/shm
/dev/sda1 64000 44 63956 1% /boot
/dev/mapper/vg_usalfd0nagxi0-lv_home
128000 63 127937 1% /home
/dev/mapper/vg_usalfd0nagxi0-lv_opt
256000 32373 223627 13% /opt
/dev/mapper/vg_usalfd0nagxi0-lv_tmp
711312 7942 703370 2% /tmp
/dev/mapper/vg_usalfd0nagxi0-lv_usr
384272 56598 327674 15% /usr
/dev/mapper/vg_usalfd0nagxi0-lv_usr_local
6540800 93947 6446853 2% /usr/local
/dev/mapper/vg_usalfd0nagxi0-lv_var
1095584 5271 1090313 1% /var
/dev/mapper/vg_usalfd0nagxi0-lv_mysqldata
6553600 576 6553024 1% /mysqldata
/dev/mapper/vg_usalfd0nagxi0-lv_store
6553600 294625 6258975 5% /store
tmpfs 16503809 12 16503797 1% /var/nagiosramdisk
[root@usalfd0nagxi01 local]# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 515650
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 125000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 515650
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Last edited by tmcdonald on Thu Jul 07, 2016 3:59 pm, edited 1 time in total.
Reason: Please use [code][/code] tags around long output
Reason: Please use [code][/code] tags around long output
Re: ndo2db & rsyslogd ERRORS
When you say that "Nagios was hung" do you mean the XI interface was not updating? Or did you see processes in top for example taking up 100% CPU? If it was just the XI interface, I would like you to log in to the Core interface instead and see if things are properly updating there. The Core interface is the same URL as XI, but without the "xi" at the end: http://192.168.1.100/nagios
If things are properly updating there, it's a NDO issue. If they are not, it's with Core. This will help guide us down the right troubleshooting path.
If things are properly updating there, it's a NDO issue. If they are not, it's with Core. This will help guide us down the right troubleshooting path.
Former Nagios employee
Re: ndo2db & rsyslogd ERRORS
Specifically what happened was that we only saw 1000 out of our 6000 hosts within the XI home screen and the Monitoring Engine Status was red, and no amount of restarting it would turn it on (nor did restarting nagios service). Troubleshooting the difference between core and XI is very interesting and I will add that to my procedural docs, however, since we already rebooted the server, we won't know unless it happens again.
Re: ndo2db & rsyslogd ERRORS
Let us know if you run into the same issue again so that we can investigate. Do you want us to keep this thread open for the time being or we can close it?Troubleshooting the difference between core and XI is very interesting and I will add that to my procedural docs, however, since we already rebooted the server, we won't know unless it happens again.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Re: ndo2db & rsyslogd ERRORS
I still have the NDO messages and the rsyslog messages, the only difference is Nagios isn't hung...