Page 2 of 3

Re: Notifications are disabled

Posted: Thu Sep 25, 2014 5:37 pm
by tonyleatwork
abrist wrote:What version of XI are you running?
Are any of your db tables crashed?

Code: Select all

grep crashed /var/log/mysqld.log
How big are your tables?

Code: Select all

ls -lahS /var/lib/mysql/nagios | head -10

Version:

Note that this system was built from a VM template directly from Nagios ~1 month ago.

Code: Select all

Nagios XI Version : 2014R1.4
nwd2ng01 2.6.32-358.2.1.el6.x86_64 x86_64
CentOS release 6.5 (Final)
Gnome is not installed
No crashed tables.. (grep had no results)

Tables are small (less than 60 hosts with about 10 checks each but configured as host group checks).

Code: Select all

total 36M
-rw-rw---- 1 mysql mysql  11M Sep 25 18:23 nagios_logentries.MYD
-rw-rw---- 1 mysql mysql 6.5M Sep 25 18:23 nagios_notifications.MYD
-rw-rw---- 1 mysql mysql 6.3M Sep 25 18:23 nagios_logentries.MYI
-rw-rw---- 1 mysql mysql 2.5M Sep 25 18:23 nagios_notifications.MYI
-rw-rw---- 1 mysql mysql 2.1M Sep 25 18:23 nagios_contactnotifications.MYI
-rw-rw---- 1 mysql mysql 1.6M Sep 25 18:23 nagios_statehistory.MYD
-rw-rw---- 1 mysql mysql 1.4M Sep 25 18:23 nagios_contactnotificationmethods.MYI
-rw-rw---- 1 mysql mysql 1.3M Sep 25 18:23 nagios_contactnotificationmethods.MYD
-rw-rw---- 1 mysql mysql 1.2M Sep 25 18:23 nagios_contactnotifications.MYD
Another observation is that it takes about 3 hours for the notifications to come back, and now the Monitoring Engine is down (red exclamation mark). I'm just going to let it sit and see what happens.

latest /var/log/messages shows this:

Code: Select all

Sep 25 18:31:52 nwd2ng01 ndo2db: Message sent to queue.
Sep 25 18:31:52 nwd2ng01 ndo2db: Warning: queue send error, retrying...
Sep 25 18:32:10 nwd2ng01 ndo2db: Message sent to queue.
Sep 25 18:32:10 nwd2ng01 ndo2db: Warning: queue send error, retrying...
Sep 25 18:32:28 nwd2ng01 ndo2db: Message sent to queue.
Sep 25 18:32:28 nwd2ng01 ndo2db: Warning: queue send error, retrying...
Sep 25 18:32:46 nwd2ng01 ndo2db: Message sent to queue.
Sep 25 18:32:46 nwd2ng01 ndo2db: Warning: queue send error, retrying...
Sep 25 18:33:04 nwd2ng01 ndo2db: Message sent to queue.
Sep 25 18:33:04 nwd2ng01 ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 128000 of 15736 messages and 131072000 of 131072000 bytes in the queue. See README for kernel tuning options.

Re: Notifications are disabled

Posted: Thu Sep 25, 2014 10:09 pm
by tonyleatwork
Searching online I've found a tool to check the ndo2db queue and found that the queue is constantly growing and based on the post I found this from, 26K messages is on the high side for this small amount of monitored items.

------ Message Queues --------
key msqid owner perms used-bytes messages
0xa1000002 0 nagios 600 26942464 26311

Re: Notifications are disabled

Posted: Fri Sep 26, 2014 9:32 am
by tonyleatwork
tonyleatwork wrote:Searching online I've found a tool to check the ndo2db queue and found that the queue is constantly growing and based on the post I found this from, 26K messages is on the high side for this small amount of monitored items.

------ Message Queues --------
key msqid owner perms used-bytes messages
0xa1000002 0 nagios 600 26942464 26311
Now the queue looks completely filled and I'm getting the warning messages again. It seems like something is clogging it and it just builds up until the kernel limit and checks will eventually stop working at this point.

Re: Notifications are disabled

Posted: Fri Sep 26, 2014 11:38 am
by Box293
I'll get you to private message me your system profile:
  • In Nagios XI
    Click the Admin menu
    System Config > System Profile
    Click the Download Profile button and pm me that file

Re: Notifications are disabled

Posted: Fri Sep 26, 2014 12:28 pm
by Box293
OK when looking through your logs I noticed these entries in nagios.log:
[1411705896] Warning: A system time change of 1913 seconds (0d 0h 31m 53s forwards in time) has been detected. Compensating...
[1411711224] Warning: A system time change of 3896 seconds (0d 1h 4m 56s forwards in time) has been detected. Compensating...
[1411718709] Warning: A system time change of 7485 seconds (0d 2h 4m 45s forwards in time) has been detected. Compensating...
[1411724952] Warning: A system time change of 6243 seconds (0d 1h 44m 3s forwards in time) has been detected. Compensating...
[1411731152] Warning: A system time change of 6200 seconds (0d 1h 43m 20s forwards in time) has been detected. Compensating...
[1411737868] Warning: A system time change of 6716 seconds (0d 1h 51m 56s forwards in time) has been detected. Compensating...
[1411744294] Warning: A system time change of 6426 seconds (0d 1h 47m 6s forwards in time) has been detected. Compensating...
[1411750693] Warning: A system time change of 6399 seconds (0d 1h 46m 39s forwards in time) has been detected. Compensating...
This is more than likely the cause of your issues.

If this is a virtual machine, how is your time syncing configured?

Re: Notifications are disabled

Posted: Fri Sep 26, 2014 2:20 pm
by tonyleatwork
Box293 wrote:OK when looking through your logs I noticed these entries in nagios.log:
[1411705896] Warning: A system time change of 1913 seconds (0d 0h 31m 53s forwards in time) has been detected. Compensating...
[1411711224] Warning: A system time change of 3896 seconds (0d 1h 4m 56s forwards in time) has been detected. Compensating...
[1411718709] Warning: A system time change of 7485 seconds (0d 2h 4m 45s forwards in time) has been detected. Compensating...
[1411724952] Warning: A system time change of 6243 seconds (0d 1h 44m 3s forwards in time) has been detected. Compensating...
[1411731152] Warning: A system time change of 6200 seconds (0d 1h 43m 20s forwards in time) has been detected. Compensating...
[1411737868] Warning: A system time change of 6716 seconds (0d 1h 51m 56s forwards in time) has been detected. Compensating...
[1411744294] Warning: A system time change of 6426 seconds (0d 1h 47m 6s forwards in time) has been detected. Compensating...
[1411750693] Warning: A system time change of 6399 seconds (0d 1h 46m 39s forwards in time) has been detected. Compensating...
This is more than likely the cause of your issues.

If this is a virtual machine, how is your time syncing configured?
Box293 wrote:I'll get you to private message me your system profile:
  • In Nagios XI
    Click the Admin menu
    System Config > System Profile
    Click the Download Profile button and pm me that file
Let me look into that. I believe we have ntpd configured against our internal time server.

Re: Notifications are disabled

Posted: Fri Sep 26, 2014 2:40 pm
by lmiltchev
Let us know if you have any more questions/issues.

Re: Notifications are disabled

Posted: Fri Sep 26, 2014 2:45 pm
by tonyleatwork
tonyleatwork wrote:
Box293 wrote:OK when looking through your logs I noticed these entries in nagios.log:
[1411705896] Warning: A system time change of 1913 seconds (0d 0h 31m 53s forwards in time) has been detected. Compensating...
[1411711224] Warning: A system time change of 3896 seconds (0d 1h 4m 56s forwards in time) has been detected. Compensating...
[1411718709] Warning: A system time change of 7485 seconds (0d 2h 4m 45s forwards in time) has been detected. Compensating...
[1411724952] Warning: A system time change of 6243 seconds (0d 1h 44m 3s forwards in time) has been detected. Compensating...
[1411731152] Warning: A system time change of 6200 seconds (0d 1h 43m 20s forwards in time) has been detected. Compensating...
[1411737868] Warning: A system time change of 6716 seconds (0d 1h 51m 56s forwards in time) has been detected. Compensating...
[1411744294] Warning: A system time change of 6426 seconds (0d 1h 47m 6s forwards in time) has been detected. Compensating...
[1411750693] Warning: A system time change of 6399 seconds (0d 1h 46m 39s forwards in time) has been detected. Compensating...
This is more than likely the cause of your issues.

If this is a virtual machine, how is your time syncing configured?
Box293 wrote:I'll get you to private message me your system profile:
  • In Nagios XI
    Click the Admin menu
    System Config > System Profile
    Click the Download Profile button and pm me that file
Let me look into that. I believe we have ntpd configured against our internal time server.
I checked and the server time was off by a half a minute, I enforced a date sync to our time server via ntpdate and all servers are showing a synchronous time.

Unfortunately the ndo2db queue looks like it's still growing and pretty quickly...

Re: Notifications are disabled

Posted: Fri Sep 26, 2014 3:03 pm
by Box293
I'm starting to scratch my head.

I did find this in the FAQ's:
http://support.nagios.com/wiki/index.php/Nagios_XI:FAQs


If you're experiencing any of the following issues after an upgrade from Nagios XI 2011r2.x to 3.x:

Missing hosts or services or status data
Takes a VERY long time to Apply Configuration or restart the Nagios process
Unusually high CPU load
A flood of messages in the /var/log/messages related to ndo2db

Then you may need to manually set a few kernel settings on your system. In Nagios XI 2011r3.x+ the Ndoutils subcomponent now uses asynchronous writes to log status information to the database, and these messages are sent to the Linux kernel's message queue. Our upgrade scripts will tune the kernel settings automatically as of 2011r3.2, but in the event that you see the above symptoms on your system, we recommend applying the following settings to your system.

Open /etc/sysctl.conf with a text editor. Edit the file to match the following values:

# Controls the maximum size of a message, in bytes
kernel.msgmnb = 131072000

# Controls the default maxmimum size of a mesage queue
kernel.msgmax = 131072000

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 4294967295

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 268435456

## The maximum number of messages allowed in any one message queue
kernel.msgmni = 256000


Note: If you don't have these entries in the "/etc/sysctl.conf" file, just add them to the end of the file.

After these settings are saved to the file, run:

sysctl -p

To apply the new settings. If the system still appears to be working improperly, reboot the machine.

Re: Notifications are disabled

Posted: Fri Oct 03, 2014 9:14 am
by tonyleatwork
I made the kernel configurations but it did not work. The kernel message queue is just filling up much faster than it can manage and once it hits the limit, I'll see ndo2db errors.

One thing that did help (temporarily) is the database repair, but the issue came back about an hour later.

http://www.google.com/url?sa=t&rct=j&q= ... 2529,d.aWw

Since the last update, I went and created another Virtual Machine using the template provided by Nagios XI. Everything seemed to be working OK until I started adding the Windows systems. Now the ndo2db issue is happening on the new VM as well.

The service checks are just a duplicate of the ones created by the wizard but one unique aspect is that instead of the password for the account listed in the service, I placed it inside the resource.cfg and the password itself contains an exclamation mark.

Can you let me know if either of those things are a known issue?