Nagios xi stops working continuosly

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
Pablogc
Posts: 16
Joined: Fri Feb 26, 2016 6:15 am

Nagios xi stops working continuosly

Post by Pablogc »

Nagios xi 5.2.3 version stops working continuously.


nagios.log last line say:
wproc: 'Core Worker 18951' seems to be choked. ret = -1; bufsize = 1250: errno = 11 (Resource temporarily unavailable)


messages last line say:
ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 128000 of 16 messages and 131072000 of 131072000 bytes in the queue. See README for kernel tuning options.


npcd.log last lines say:
[04-07-2016 10:32:19] NPCD: ERROR: Executed command exits with return code '7'
[04-07-2016 10:32:19] NPCD: ERROR: Command line was '/usr/local/nagios/libexec/process_perfdata.pl -n -b /usr/local/nagios/var/spool/perfdata//1460041080.perfdata.service'


Any idea?

Thanks.
tmcdonald
Posts: 9117
Joined: Mon Sep 23, 2013 8:40 am

Re: Nagios xi stops working continuosly

Post by tmcdonald »

How many hosts and services do you have? The system might just simply be too overloaded. What is the output of ipcs -q on the command line?
Former Nagios employee
Pablogc
Posts: 16
Joined: Fri Feb 26, 2016 6:15 am

Re: Nagios xi stops working continuosly

Post by Pablogc »

tmcdonald wrote:How many hosts and services do you have? The system might just simply be too overloaded. What is the output of ipcs -q on the command line?

Hello, I have 108 hosts and 6547 services.

"ipcs -q" output:

------ Message Queues --------
key msqid owner perms used-bytes messages
0x16000002 6029312 nagios 600 131072000 128000
0xdf000002 6127617 nagios 600 0 0
0xab000002 6160386 nagios 600 0 0
0x58000002 6193155 nagios 600 0 0
0xc8000002 6258692 nagios 600 3072 3
0x52000002 6291461 nagios 600 0 0
0x3c000002 6324230 nagios 600 0 0
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: Nagios xi stops working continuosly

Post by hsmith »

What did you give this machine in terms of resources?

Also, it looks like you're running into some kernel message queue issues.
https://support.nagios.com/wiki/index.php/Nagios_XI:FAQs wrote:If you're experiencing any of the following issues after an upgrade from Nagios XI 2011r2.x to 3.x:

Missing hosts or services or status data
Takes a VERY long time to Apply Configuration or restart the Nagios process
Unusually high CPU load
A flood of messages in the /var/log/messages related to ndo2db

Then you may need to manually set a few kernel settings on your system. In Nagios XI 2011r3.x+ the Ndoutils subcomponent now uses asynchronous writes to log status information to the database, and these messages are sent to the Linux kernel's message queue. Our upgrade scripts will tune the kernel settings automatically as of 2011r3.2, but in the event that you see the above symptoms on your system, we recommend applying the following settings to your system.

Open /etc/sysctl.conf with a text editor. Edit the file to match the following values:

# Controls the maximum size of a message, in bytes
kernel.msgmnb = 131072000

# Controls the default maxmimum size of a mesage queue
kernel.msgmax = 131072000

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 4294967295

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 268435456

## The maximum number of messages allowed in any one message queue
kernel.msgmni = 256000


Note: If you don't have these entries in the "/etc/sysctl.conf" file, just add them to the end of the file.

After these settings are saved to the file, run:

sysctl -p

To apply the new settings. If the system still appears to be working improperly, reboot the machine.
Former Nagios Employee.
me.
Locked