ndo2db & rsyslogd ERRORS

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
jpelley
Posts: 44
Joined: Wed Jul 22, 2015 12:14 pm

ndo2db & rsyslogd ERRORS

Post by jpelley »

Nagios environment has been less than stable lately, this morning Nagios was hung and the following were the prevailing errors in /var/log/messages:

Jul 7 08:52:56 usalfd0nagxi01 ndo2db: Error: queue recv error.
Jul 7 08:52:56 usalfd0nagxi01 rsyslogd-2177: imuxsock begins to drop messages from pid 27855 due to rate-limiting

Had to perform a server reboot to recover, but we are still seeing these errors and fear another outage.
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: ndo2db & rsyslogd ERRORS

Post by rkennedy »

Can you PM over a profile for us to look at? (Admin -> System Profile -> Download Profile)

Additionally, how many CPU's do you have allocated to this machine?
Former Nagios Employee
jpelley
Posts: 44
Joined: Wed Jul 22, 2015 12:14 pm

Re: ndo2db & rsyslogd ERRORS

Post by jpelley »

INRE to uploading the profile.zip "The file is too big, maximum allowed size is 1 MiB"

I've checked, and the my profile.zip is 1.30MB
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: ndo2db & rsyslogd ERRORS

Post by rkennedy »

Ah, are you authorized to email into [email protected]? Otherwise, try splitting it into two and PM'ing it over.

EDIT: Profile received.
Former Nagios Employee
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: ndo2db & rsyslogd ERRORS

Post by rkennedy »

Looks like you have quite a bit of checks going, this could be why we're seeing the issues. What is the output of these commands?

Code: Select all

ipcs -q
df -i
ulimit -a
To add to that, it looks like SQL is / was crashed -

Code: Select all

160707 10:01:33 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
160707 10:01:33 [ERROR] /usr/libexec/mysqld: Table './nagios/nagios_logentries' is marked as crashed and last (automatic?) repair failed
160707 10:01:35 [Note] /usr/libexec/mysqld: Normal shutdown
Try running /usr/local/nagiosxi/scripts/repair_databases.sh and let us know how that goes.
Former Nagios Employee
jpelley
Posts: 44
Joined: Wed Jul 22, 2015 12:14 pm

Re: ndo2db & rsyslogd ERRORS

Post by jpelley »

INRE to mysql, yes, after reboot, logentries table was broken, ran the repair script just after 10AM to fix (not sure which came first, chicken or the egg)

Code: Select all

[root@usalfd0nagxi01 local]# ipcs -q

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0xd5000002 32768      nagios     600        0            0

[root@usalfd0nagxi01 local]# df -i
Filesystem             Inodes  IUsed    IFree IUse% Mounted on
/dev/mapper/vg_usalfd0nagxi0-lv_root
                       128000  17985   110015   15% /
tmpfs                16503809      1 16503808    1% /dev/shm
/dev/sda1               64000     44    63956    1% /boot
/dev/mapper/vg_usalfd0nagxi0-lv_home
                       128000     63   127937    1% /home
/dev/mapper/vg_usalfd0nagxi0-lv_opt
                       256000  32373   223627   13% /opt
/dev/mapper/vg_usalfd0nagxi0-lv_tmp
                       711312   7942   703370    2% /tmp
/dev/mapper/vg_usalfd0nagxi0-lv_usr
                       384272  56598   327674   15% /usr
/dev/mapper/vg_usalfd0nagxi0-lv_usr_local
                      6540800  93947  6446853    2% /usr/local
/dev/mapper/vg_usalfd0nagxi0-lv_var
                      1095584   5271  1090313    1% /var
/dev/mapper/vg_usalfd0nagxi0-lv_mysqldata
                      6553600    576  6553024    1% /mysqldata
/dev/mapper/vg_usalfd0nagxi0-lv_store
                      6553600 294625  6258975    5% /store
tmpfs                16503809     12 16503797    1% /var/nagiosramdisk

[root@usalfd0nagxi01 local]# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 515650
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 125000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 515650
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
Last edited by tmcdonald on Thu Jul 07, 2016 3:59 pm, edited 1 time in total.
Reason: Please use [code][/code] tags around long output
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: ndo2db & rsyslogd ERRORS

Post by tmcdonald »

When you say that "Nagios was hung" do you mean the XI interface was not updating? Or did you see processes in top for example taking up 100% CPU? If it was just the XI interface, I would like you to log in to the Core interface instead and see if things are properly updating there. The Core interface is the same URL as XI, but without the "xi" at the end: http://192.168.1.100/nagios

If things are properly updating there, it's a NDO issue. If they are not, it's with Core. This will help guide us down the right troubleshooting path.
Former Nagios employee
jpelley
Posts: 44
Joined: Wed Jul 22, 2015 12:14 pm

Re: ndo2db & rsyslogd ERRORS

Post by jpelley »

Specifically what happened was that we only saw 1000 out of our 6000 hosts within the XI home screen and the Monitoring Engine Status was red, and no amount of restarting it would turn it on (nor did restarting nagios service). Troubleshooting the difference between core and XI is very interesting and I will add that to my procedural docs, however, since we already rebooted the server, we won't know unless it happens again.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: ndo2db & rsyslogd ERRORS

Post by lmiltchev »

Troubleshooting the difference between core and XI is very interesting and I will add that to my procedural docs, however, since we already rebooted the server, we won't know unless it happens again.
Let us know if you run into the same issue again so that we can investigate. Do you want us to keep this thread open for the time being or we can close it?
Be sure to check out our Knowledgebase for helpful articles and solutions!
jpelley
Posts: 44
Joined: Wed Jul 22, 2015 12:14 pm

Re: ndo2db & rsyslogd ERRORS

Post by jpelley »

I still have the NDO messages and the rsyslog messages, the only difference is Nagios isn't hung...
Locked