NdoUtils stop working

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
algomas123
Posts: 16
Joined: Mon Jun 27, 2016 8:58 am

NdoUtils stop working

Post by algomas123 »

Hi everybody.

I have had Nagios 4.1.1 and ndoutils 2.0 running for some months without problems. I also use ndo2db for writing all information to a mysql db that is in the same server. I have like 30 hosts and about 300 services.

Some days ago, after installing nagiosql (but not sure if this is the problem...as make not sense) on the same mysql db...I start having some problems:

After some hours (3-4) running...linux message queue start to increase crazily and /var/log/messages is flooded with:
ndo2db: Error: queue recv error.

and no more data is inserted to database!! Both services keep running, but looks like socket is closed (but the file is there!!!)

My kernel parameters are set as recommended (131072000)...mysql db has ram and space enough for a good performance.

I would appreciate any clue or help, I don't know what else to try!

Thanks in advance.
ssax
Dreams In Code
Posts: 7682
Joined: Wed Feb 11, 2015 12:54 pm

Re: NdoUtils stop working

Post by ssax »

What is the output of this command:

Code: Select all

ipcs -q
If you see more than one nagios message queue, run these commands:

Code: Select all

service nagios stop
killall -9 nagios
service ndo2db stop
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
service ndo2db start
service nagios start
Then validate again:

Code: Select all

ipcs -q
See if that resolves the issue.
algomas123
Posts: 16
Joined: Mon Jun 27, 2016 8:58 am

Re: NdoUtils stop working

Post by algomas123 »

Thanks for your reply!

This is my queue
Capture.JPG
but it increase by 20 every second!!

ndo2db is not quering any more.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: NdoUtils stop working

Post by tgriep »

Is the MYSQL server local to the Nagios server or is it remote?
Are you seeing any errors in the MYSQL logs?
What OS and version is the Core system running on?
Can you post this file so we can view it?

Code: Select all

/etc/sysctl.conf
Be sure to check out our Knowledgebase for helpful articles and solutions!
algomas123
Posts: 16
Joined: Mon Jun 27, 2016 8:58 am

Re: NdoUtils stop working

Post by algomas123 »

Hello!

Yes, it is local database.
No, there are no errors in mysql logs...and it has enought ram and disk.
It is running Nagios 4.1.1 on a Centos 7.

Code: Select all

# Controls the maximum size of a message, in bytes
 kernel.msgmnb = 131072000

 # Controls the default maxmimum size of a mesage queue
 kernel.msgmax = 131072000

 # Controls the maximum shared segment size, in bytes
 kernel.shmmax = 4294967295

 # Controls the maximum number of shared memory segments, in pages
 kernel.shmall = 268435456
 ## The maximum number of messages allowed in any one message queue
 kernel.msgmni = 256000

# Controls the maximum number of shared memory segments, in pages
 kernel.shmmni = 4096
 kernel.sem = 250 32000 100 128
 fs.file-max = 6815744
# net.ipv4.ip_local_port_range = 9000 65500
 net.core.rmem_default = 262144
 net.core.wmem_default = 262144
 net.core.rmem_max = 4194304
 net.core.wmem_max = 1048576
 fs.aio-max-nr = 1048576
Now it is crashing again, and ndo2db is high in cpu (also systemd-journal...I guess because it is writting lots of logs with "Queue recv error"
[img]
Capture2.JPG
[/img]

Thanks for your help
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: NdoUtils stop working

Post by tgriep »

One thing you can try is to edit the /etc/sysctl.conf file and change this from

Code: Select all

kernel.msgmni = 256000
to

Code: Select all

kernel.msgmni = 512000
Run this to apply the change

Code: Select all

sysctl -p
See if that helps out.

You could update the ndoutils to the master branch which may help out too.
Here is the process for doing that.

Code: Select all

cd /tmp
wget https://github.com/NagiosEnterprises/ndoutils/archive/master.tar.gz
tar xvfz master.tar.gz
cd  ndoutils-master
./configure
make 
make install
service nagios stop
service ndo2db restart
service nagios start
Let us know if either of these changes works for you.
Be sure to check out our Knowledgebase for helpful articles and solutions!
algomas123
Posts: 16
Joined: Mon Jun 27, 2016 8:58 am

Re: NdoUtils stop working

Post by algomas123 »

Ok, just tried both solutions...I will say something in some hours (It use to fail after 4-5 hours).

Many thanks for your help.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: NdoUtils stop working

Post by tgriep »

No problem, let us know how it works out.
Be sure to check out our Knowledgebase for helpful articles and solutions!
algomas123
Posts: 16
Joined: Mon Jun 27, 2016 8:58 am

Re: NdoUtils stop working

Post by algomas123 »

Hello,

Now it is even more strange!
I did what you said with a minor modification:
I added code in order to print the error reason to the queue.c file where it crash

Code: Select all

char* pop_from_queue(void) {
        struct ndo2db_queue_msg msg;
        char *buf;
        ssize_t received;
        size_t buf_size;

        received = msgrcv(queue_id, &msg, NDO_MAX_MSG_SIZE, NDO_MSG_TYPE, MSG_NOERROR);
        if (received < 0) {
                int errno_save = errno;
//Added sterror...
                syslog(LOG_ERR, "Error: queue recv error: %s", strerror(errno));
                errno = errno_save;
                received = 0;
        }

        buf_size = strnlen(msg.text, (size_t)received);
        buf = malloc(buf_size+1);
        strncpy(buf, msg.text, buf_size);
        buf[buf_size] = '\0';

        return buf;
}
Ok...so the after hours, /var/log/message is showing me this error:

Code: Select all

ndo2db: Error: queue recv error: Invalid argument
Strange thing is that I do not have any message queue, ndo2db is not quering...BUT NAGIOS IS WORKING!! (Maybe only in memory...I dont know)


EDIT: Ok, obviously the invalid argument is because queue is not there anymore...but why?
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: NdoUtils stop working

Post by tgriep »

You may want to enable debugging in the ndo2db.cfg file and see what error shows up there when the issue happens again.
We might get more details on what is failing which the developers could use.
Nagios Core only uses the MYSQL database to store it's information / status for other 3rd party tools to use, it doesn't use it to run.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked