NdoUtils stop working
-
algomas123
- Posts: 16
- Joined: Mon Jun 27, 2016 8:58 am
NdoUtils stop working
Hi everybody.
I have had Nagios 4.1.1 and ndoutils 2.0 running for some months without problems. I also use ndo2db for writing all information to a mysql db that is in the same server. I have like 30 hosts and about 300 services.
Some days ago, after installing nagiosql (but not sure if this is the problem...as make not sense) on the same mysql db...I start having some problems:
After some hours (3-4) running...linux message queue start to increase crazily and /var/log/messages is flooded with:
ndo2db: Error: queue recv error.
and no more data is inserted to database!! Both services keep running, but looks like socket is closed (but the file is there!!!)
My kernel parameters are set as recommended (131072000)...mysql db has ram and space enough for a good performance.
I would appreciate any clue or help, I don't know what else to try!
Thanks in advance.
I have had Nagios 4.1.1 and ndoutils 2.0 running for some months without problems. I also use ndo2db for writing all information to a mysql db that is in the same server. I have like 30 hosts and about 300 services.
Some days ago, after installing nagiosql (but not sure if this is the problem...as make not sense) on the same mysql db...I start having some problems:
After some hours (3-4) running...linux message queue start to increase crazily and /var/log/messages is flooded with:
ndo2db: Error: queue recv error.
and no more data is inserted to database!! Both services keep running, but looks like socket is closed (but the file is there!!!)
My kernel parameters are set as recommended (131072000)...mysql db has ram and space enough for a good performance.
I would appreciate any clue or help, I don't know what else to try!
Thanks in advance.
Re: NdoUtils stop working
What is the output of this command:
If you see more than one nagios message queue, run these commands:
Then validate again:
See if that resolves the issue.
Code: Select all
ipcs -qCode: Select all
service nagios stop
killall -9 nagios
service ndo2db stop
for i in `ipcs -q | grep nagios |awk '{print $2}'`; do ipcrm -q $i; done
service ndo2db start
service nagios startCode: Select all
ipcs -q-
algomas123
- Posts: 16
- Joined: Mon Jun 27, 2016 8:58 am
Re: NdoUtils stop working
Thanks for your reply!
This is my queue but it increase by 20 every second!!
ndo2db is not quering any more.
This is my queue but it increase by 20 every second!!
ndo2db is not quering any more.
Re: NdoUtils stop working
Is the MYSQL server local to the Nagios server or is it remote?
Are you seeing any errors in the MYSQL logs?
What OS and version is the Core system running on?
Can you post this file so we can view it?
Are you seeing any errors in the MYSQL logs?
What OS and version is the Core system running on?
Can you post this file so we can view it?
Code: Select all
/etc/sysctl.confBe sure to check out our Knowledgebase for helpful articles and solutions!
-
algomas123
- Posts: 16
- Joined: Mon Jun 27, 2016 8:58 am
Re: NdoUtils stop working
Hello!
Yes, it is local database.
No, there are no errors in mysql logs...and it has enought ram and disk.
It is running Nagios 4.1.1 on a Centos 7.
Now it is crashing again, and ndo2db is high in cpu (also systemd-journal...I guess because it is writting lots of logs with "Queue recv error"
[img][/img]
Thanks for your help
Yes, it is local database.
No, there are no errors in mysql logs...and it has enought ram and disk.
It is running Nagios 4.1.1 on a Centos 7.
Code: Select all
# Controls the maximum size of a message, in bytes
kernel.msgmnb = 131072000
# Controls the default maxmimum size of a mesage queue
kernel.msgmax = 131072000
# Controls the maximum shared segment size, in bytes
kernel.shmmax = 4294967295
# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 268435456
## The maximum number of messages allowed in any one message queue
kernel.msgmni = 256000
# Controls the maximum number of shared memory segments, in pages
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
fs.file-max = 6815744
# net.ipv4.ip_local_port_range = 9000 65500
net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_max = 1048576
fs.aio-max-nr = 1048576
[img][/img]
Thanks for your help
Re: NdoUtils stop working
One thing you can try is to edit the /etc/sysctl.conf file and change this from
to
Run this to apply the change
See if that helps out.
You could update the ndoutils to the master branch which may help out too.
Here is the process for doing that.
Let us know if either of these changes works for you.
Code: Select all
kernel.msgmni = 256000Code: Select all
kernel.msgmni = 512000Code: Select all
sysctl -pYou could update the ndoutils to the master branch which may help out too.
Here is the process for doing that.
Code: Select all
cd /tmp
wget https://github.com/NagiosEnterprises/ndoutils/archive/master.tar.gz
tar xvfz master.tar.gz
cd ndoutils-master
./configure
make
make install
service nagios stop
service ndo2db restart
service nagios startBe sure to check out our Knowledgebase for helpful articles and solutions!
-
algomas123
- Posts: 16
- Joined: Mon Jun 27, 2016 8:58 am
Re: NdoUtils stop working
Ok, just tried both solutions...I will say something in some hours (It use to fail after 4-5 hours).
Many thanks for your help.
Many thanks for your help.
Re: NdoUtils stop working
No problem, let us know how it works out.
Be sure to check out our Knowledgebase for helpful articles and solutions!
-
algomas123
- Posts: 16
- Joined: Mon Jun 27, 2016 8:58 am
Re: NdoUtils stop working
Hello,
Now it is even more strange!
I did what you said with a minor modification:
I added code in order to print the error reason to the queue.c file where it crash
Ok...so the after hours, /var/log/message is showing me this error:
Strange thing is that I do not have any message queue, ndo2db is not quering...BUT NAGIOS IS WORKING!! (Maybe only in memory...I dont know)
EDIT: Ok, obviously the invalid argument is because queue is not there anymore...but why?
Now it is even more strange!
I did what you said with a minor modification:
I added code in order to print the error reason to the queue.c file where it crash
Code: Select all
char* pop_from_queue(void) {
struct ndo2db_queue_msg msg;
char *buf;
ssize_t received;
size_t buf_size;
received = msgrcv(queue_id, &msg, NDO_MAX_MSG_SIZE, NDO_MSG_TYPE, MSG_NOERROR);
if (received < 0) {
int errno_save = errno;
//Added sterror...
syslog(LOG_ERR, "Error: queue recv error: %s", strerror(errno));
errno = errno_save;
received = 0;
}
buf_size = strnlen(msg.text, (size_t)received);
buf = malloc(buf_size+1);
strncpy(buf, msg.text, buf_size);
buf[buf_size] = '\0';
return buf;
}
Code: Select all
ndo2db: Error: queue recv error: Invalid argument
EDIT: Ok, obviously the invalid argument is because queue is not there anymore...but why?
Re: NdoUtils stop working
You may want to enable debugging in the ndo2db.cfg file and see what error shows up there when the issue happens again.
We might get more details on what is failing which the developers could use.
Nagios Core only uses the MYSQL database to store it's information / status for other 3rd party tools to use, it doesn't use it to run.
We might get more details on what is failing which the developers could use.
Nagios Core only uses the MYSQL database to store it's information / status for other 3rd party tools to use, it doesn't use it to run.
Be sure to check out our Knowledgebase for helpful articles and solutions!