ndo2db & rsyslogd ERRORS

jpelley · Post by **jpelley** » Thu Jul 07, 2016 10:13 am

Nagios environment has been less than stable lately, this morning Nagios was hung and the following were the prevailing errors in /var/log/messages:

Jul 7 08:52:56 usalfd0nagxi01 ndo2db: Error: queue recv error.
Jul 7 08:52:56 usalfd0nagxi01 rsyslogd-2177: imuxsock begins to drop messages from pid 27855 due to rate-limiting

Had to perform a server reboot to recover, but we are still seeing these errors and fear another outage.

rkennedy · Post by **rkennedy** » Thu Jul 07, 2016 12:01 pm

Can you PM over a profile for us to look at? (Admin -> System Profile -> Download Profile)

Additionally, how many CPU's do you have allocated to this machine?

jpelley · Post by **jpelley** » Thu Jul 07, 2016 12:40 pm

INRE to uploading the profile.zip "The file is too big, maximum allowed size is 1 MiB"

I've checked, and the my profile.zip is 1.30MB

rkennedy · Post by **rkennedy** » Thu Jul 07, 2016 12:42 pm

Ah, are you authorized to email into [email protected]? Otherwise, try splitting it into two and PM'ing it over.

EDIT: Profile received.

rkennedy · Post by **rkennedy** » Thu Jul 07, 2016 1:21 pm

Looks like you have quite a bit of checks going, this could be why we're seeing the issues. What is the output of these commands?

Code: Select all

ipcs -q
df -i
ulimit -a

To add to that, it looks like SQL is / was crashed -

Code: Select all

160707 10:01:33 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
160707 10:01:33 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
160707 10:01:35 [Note] /usr/libexec/mysqld: Normal shutdown

Try running /usr/local/nagiosxi/scripts/repair_databases.sh and let us know how that goes.

jpelley · Post by **jpelley** » Thu Jul 07, 2016 1:41 pm

INRE to mysql, yes, after reboot, logentries table was broken, ran the repair script just after 10AM to fix (not sure which came first, chicken or the egg)

Code: Select all

[root@usalfd0nagxi01 local]# ipcs -q

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0xd5000002 32768      nagios     600        0            0

[root@usalfd0nagxi01 local]# df -i
Filesystem             Inodes  IUsed    IFree IUse% Mounted on
/dev/mapper/vg_usalfd0nagxi0-lv_root
                       128000  17985   110015   15% /
tmpfs                16503809      1 16503808    1% /dev/shm
/dev/sda1               64000     44    63956    1% /boot
/dev/mapper/vg_usalfd0nagxi0-lv_home
                       128000     63   127937    1% /home
/dev/mapper/vg_usalfd0nagxi0-lv_opt
                       256000  32373   223627   13% /opt
/dev/mapper/vg_usalfd0nagxi0-lv_tmp
                       711312   7942   703370    2% /tmp
/dev/mapper/vg_usalfd0nagxi0-lv_usr
                       384272  56598   327674   15% /usr
/dev/mapper/vg_usalfd0nagxi0-lv_usr_local
                      6540800  93947  6446853    2% /usr/local
/dev/mapper/vg_usalfd0nagxi0-lv_var
                      1095584   5271  1090313    1% /var
/dev/mapper/vg_usalfd0nagxi0-lv_mysqldata
                      6553600    576  6553024    1% /mysqldata
/dev/mapper/vg_usalfd0nagxi0-lv_store
                      6553600 294625  6258975    5% /store
tmpfs                16503809     12 16503797    1% /var/nagiosramdisk

[root@usalfd0nagxi01 local]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 515650
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 125000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 515650
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

tmcdonald · Post by **tmcdonald** » Thu Jul 07, 2016 4:47 pm

When you say that "Nagios was hung" do you mean the XI interface was not updating? Or did you see processes in top for example taking up 100% CPU? If it was just the XI interface, I would like you to log in to the Core interface instead and see if things are properly updating there. The Core interface is the same URL as XI, but without the "xi" at the end: http://192.168.1.100/nagios

If things are properly updating there, it's a NDO issue. If they are not, it's with Core. This will help guide us down the right troubleshooting path.

jpelley · Post by **jpelley** » Fri Jul 08, 2016 7:00 am

Specifically what happened was that we only saw 1000 out of our 6000 hosts within the XI home screen and the Monitoring Engine Status was red, and no amount of restarting it would turn it on (nor did restarting nagios service). Troubleshooting the difference between core and XI is very interesting and I will add that to my procedural docs, however, since we already rebooted the server, we won't know unless it happens again.

Post by **lmiltchev** » Fri Jul 08, 2016 1:14 pm

Troubleshooting the difference between core and XI is very interesting and I will add that to my procedural docs, however, since we already rebooted the server, we won't know unless it happens again.

Let us know if you run into the same issue again so that we can investigate. Do you want us to keep this thread open for the time being or we can close it?

jpelley · Post by **jpelley** » Fri Jul 08, 2016 1:30 pm

I still have the NDO messages and the rsyslog messages, the only difference is Nagios isn't hung...

Nagios Support Forum

ndo2db & rsyslogd ERRORS

ndo2db & rsyslogd ERRORS

Re: ndo2db & rsyslogd ERRORS

Re: ndo2db & rsyslogd ERRORS

Re: ndo2db & rsyslogd ERRORS

Re: ndo2db & rsyslogd ERRORS

Re: ndo2db & rsyslogd ERRORS

Re: ndo2db & rsyslogd ERRORS

Re: ndo2db & rsyslogd ERRORS

Re: ndo2db & rsyslogd ERRORS

Re: ndo2db & rsyslogd ERRORS