Notifications are disabled

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
tonyleatwork
Posts: 91
Joined: Mon Jul 07, 2014 8:55 am

Re: Notifications are disabled

Post by tonyleatwork »

abrist wrote:What version of XI are you running?
Are any of your db tables crashed?

Code: Select all

grep crashed /var/log/mysqld.log
How big are your tables?

Code: Select all

ls -lahS /var/lib/mysql/nagios | head -10

Version:

Note that this system was built from a VM template directly from Nagios ~1 month ago.

Code: Select all

Nagios XI Version : 2014R1.4
nwd2ng01 2.6.32-358.2.1.el6.x86_64 x86_64
CentOS release 6.5 (Final)
Gnome is not installed
No crashed tables.. (grep had no results)

Tables are small (less than 60 hosts with about 10 checks each but configured as host group checks).

Code: Select all

total 36M
-rw-rw---- 1 mysql mysql  11M Sep 25 18:23 nagios_logentries.MYD
-rw-rw---- 1 mysql mysql 6.5M Sep 25 18:23 nagios_notifications.MYD
-rw-rw---- 1 mysql mysql 6.3M Sep 25 18:23 nagios_logentries.MYI
-rw-rw---- 1 mysql mysql 2.5M Sep 25 18:23 nagios_notifications.MYI
-rw-rw---- 1 mysql mysql 2.1M Sep 25 18:23 nagios_contactnotifications.MYI
-rw-rw---- 1 mysql mysql 1.6M Sep 25 18:23 nagios_statehistory.MYD
-rw-rw---- 1 mysql mysql 1.4M Sep 25 18:23 nagios_contactnotificationmethods.MYI
-rw-rw---- 1 mysql mysql 1.3M Sep 25 18:23 nagios_contactnotificationmethods.MYD
-rw-rw---- 1 mysql mysql 1.2M Sep 25 18:23 nagios_contactnotifications.MYD
Another observation is that it takes about 3 hours for the notifications to come back, and now the Monitoring Engine is down (red exclamation mark). I'm just going to let it sit and see what happens.

latest /var/log/messages shows this:

Code: Select all

Sep 25 18:31:52 nwd2ng01 ndo2db: Message sent to queue.
Sep 25 18:31:52 nwd2ng01 ndo2db: Warning: queue send error, retrying...
Sep 25 18:32:10 nwd2ng01 ndo2db: Message sent to queue.
Sep 25 18:32:10 nwd2ng01 ndo2db: Warning: queue send error, retrying...
Sep 25 18:32:28 nwd2ng01 ndo2db: Message sent to queue.
Sep 25 18:32:28 nwd2ng01 ndo2db: Warning: queue send error, retrying...
Sep 25 18:32:46 nwd2ng01 ndo2db: Message sent to queue.
Sep 25 18:32:46 nwd2ng01 ndo2db: Warning: queue send error, retrying...
Sep 25 18:33:04 nwd2ng01 ndo2db: Message sent to queue.
Sep 25 18:33:04 nwd2ng01 ndo2db: Warning: Retrying message send. This can occur because you have too few messages allowed or too few total bytes allowed in message queues. You are currently using 128000 of 15736 messages and 131072000 of 131072000 bytes in the queue. See README for kernel tuning options.
tonyleatwork
Posts: 91
Joined: Mon Jul 07, 2014 8:55 am

Re: Notifications are disabled

Post by tonyleatwork »

Searching online I've found a tool to check the ndo2db queue and found that the queue is constantly growing and based on the post I found this from, 26K messages is on the high side for this small amount of monitored items.

------ Message Queues --------
key msqid owner perms used-bytes messages
0xa1000002 0 nagios 600 26942464 26311
tonyleatwork
Posts: 91
Joined: Mon Jul 07, 2014 8:55 am

Re: Notifications are disabled

Post by tonyleatwork »

tonyleatwork wrote:Searching online I've found a tool to check the ndo2db queue and found that the queue is constantly growing and based on the post I found this from, 26K messages is on the high side for this small amount of monitored items.

------ Message Queues --------
key msqid owner perms used-bytes messages
0xa1000002 0 nagios 600 26942464 26311
Now the queue looks completely filled and I'm getting the warning messages again. It seems like something is clogging it and it just builds up until the kernel limit and checks will eventually stop working at this point.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Notifications are disabled

Post by Box293 »

I'll get you to private message me your system profile:
  • In Nagios XI
    Click the Admin menu
    System Config > System Profile
    Click the Download Profile button and pm me that file
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Notifications are disabled

Post by Box293 »

OK when looking through your logs I noticed these entries in nagios.log:
[1411705896] Warning: A system time change of 1913 seconds (0d 0h 31m 53s forwards in time) has been detected. Compensating...
[1411711224] Warning: A system time change of 3896 seconds (0d 1h 4m 56s forwards in time) has been detected. Compensating...
[1411718709] Warning: A system time change of 7485 seconds (0d 2h 4m 45s forwards in time) has been detected. Compensating...
[1411724952] Warning: A system time change of 6243 seconds (0d 1h 44m 3s forwards in time) has been detected. Compensating...
[1411731152] Warning: A system time change of 6200 seconds (0d 1h 43m 20s forwards in time) has been detected. Compensating...
[1411737868] Warning: A system time change of 6716 seconds (0d 1h 51m 56s forwards in time) has been detected. Compensating...
[1411744294] Warning: A system time change of 6426 seconds (0d 1h 47m 6s forwards in time) has been detected. Compensating...
[1411750693] Warning: A system time change of 6399 seconds (0d 1h 46m 39s forwards in time) has been detected. Compensating...
This is more than likely the cause of your issues.

If this is a virtual machine, how is your time syncing configured?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
tonyleatwork
Posts: 91
Joined: Mon Jul 07, 2014 8:55 am

Re: Notifications are disabled

Post by tonyleatwork »

Box293 wrote:OK when looking through your logs I noticed these entries in nagios.log:
[1411705896] Warning: A system time change of 1913 seconds (0d 0h 31m 53s forwards in time) has been detected. Compensating...
[1411711224] Warning: A system time change of 3896 seconds (0d 1h 4m 56s forwards in time) has been detected. Compensating...
[1411718709] Warning: A system time change of 7485 seconds (0d 2h 4m 45s forwards in time) has been detected. Compensating...
[1411724952] Warning: A system time change of 6243 seconds (0d 1h 44m 3s forwards in time) has been detected. Compensating...
[1411731152] Warning: A system time change of 6200 seconds (0d 1h 43m 20s forwards in time) has been detected. Compensating...
[1411737868] Warning: A system time change of 6716 seconds (0d 1h 51m 56s forwards in time) has been detected. Compensating...
[1411744294] Warning: A system time change of 6426 seconds (0d 1h 47m 6s forwards in time) has been detected. Compensating...
[1411750693] Warning: A system time change of 6399 seconds (0d 1h 46m 39s forwards in time) has been detected. Compensating...
This is more than likely the cause of your issues.

If this is a virtual machine, how is your time syncing configured?
Box293 wrote:I'll get you to private message me your system profile:
  • In Nagios XI
    Click the Admin menu
    System Config > System Profile
    Click the Download Profile button and pm me that file
Let me look into that. I believe we have ntpd configured against our internal time server.
User avatar
lmiltchev
Bugs find me
Posts: 13589
Joined: Mon May 23, 2011 12:15 pm

Re: Notifications are disabled

Post by lmiltchev »

Let us know if you have any more questions/issues.
Be sure to check out our Knowledgebase for helpful articles and solutions!
tonyleatwork
Posts: 91
Joined: Mon Jul 07, 2014 8:55 am

Re: Notifications are disabled

Post by tonyleatwork »

tonyleatwork wrote:
Box293 wrote:OK when looking through your logs I noticed these entries in nagios.log:
[1411705896] Warning: A system time change of 1913 seconds (0d 0h 31m 53s forwards in time) has been detected. Compensating...
[1411711224] Warning: A system time change of 3896 seconds (0d 1h 4m 56s forwards in time) has been detected. Compensating...
[1411718709] Warning: A system time change of 7485 seconds (0d 2h 4m 45s forwards in time) has been detected. Compensating...
[1411724952] Warning: A system time change of 6243 seconds (0d 1h 44m 3s forwards in time) has been detected. Compensating...
[1411731152] Warning: A system time change of 6200 seconds (0d 1h 43m 20s forwards in time) has been detected. Compensating...
[1411737868] Warning: A system time change of 6716 seconds (0d 1h 51m 56s forwards in time) has been detected. Compensating...
[1411744294] Warning: A system time change of 6426 seconds (0d 1h 47m 6s forwards in time) has been detected. Compensating...
[1411750693] Warning: A system time change of 6399 seconds (0d 1h 46m 39s forwards in time) has been detected. Compensating...
This is more than likely the cause of your issues.

If this is a virtual machine, how is your time syncing configured?
Box293 wrote:I'll get you to private message me your system profile:
  • In Nagios XI
    Click the Admin menu
    System Config > System Profile
    Click the Download Profile button and pm me that file
Let me look into that. I believe we have ntpd configured against our internal time server.
I checked and the server time was off by a half a minute, I enforced a date sync to our time server via ntpdate and all servers are showing a synchronous time.

Unfortunately the ndo2db queue looks like it's still growing and pretty quickly...
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Notifications are disabled

Post by Box293 »

I'm starting to scratch my head.

I did find this in the FAQ's:
http://support.nagios.com/wiki/index.php/Nagios_XI:FAQs


If you're experiencing any of the following issues after an upgrade from Nagios XI 2011r2.x to 3.x:

Missing hosts or services or status data
Takes a VERY long time to Apply Configuration or restart the Nagios process
Unusually high CPU load
A flood of messages in the /var/log/messages related to ndo2db

Then you may need to manually set a few kernel settings on your system. In Nagios XI 2011r3.x+ the Ndoutils subcomponent now uses asynchronous writes to log status information to the database, and these messages are sent to the Linux kernel's message queue. Our upgrade scripts will tune the kernel settings automatically as of 2011r3.2, but in the event that you see the above symptoms on your system, we recommend applying the following settings to your system.

Open /etc/sysctl.conf with a text editor. Edit the file to match the following values:

# Controls the maximum size of a message, in bytes
kernel.msgmnb = 131072000

# Controls the default maxmimum size of a mesage queue
kernel.msgmax = 131072000

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 4294967295

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 268435456

## The maximum number of messages allowed in any one message queue
kernel.msgmni = 256000


Note: If you don't have these entries in the "/etc/sysctl.conf" file, just add them to the end of the file.

After these settings are saved to the file, run:

sysctl -p

To apply the new settings. If the system still appears to be working improperly, reboot the machine.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
tonyleatwork
Posts: 91
Joined: Mon Jul 07, 2014 8:55 am

Re: Notifications are disabled

Post by tonyleatwork »

I made the kernel configurations but it did not work. The kernel message queue is just filling up much faster than it can manage and once it hits the limit, I'll see ndo2db errors.

One thing that did help (temporarily) is the database repair, but the issue came back about an hour later.

http://www.google.com/url?sa=t&rct=j&q= ... 2529,d.aWw

Since the last update, I went and created another Virtual Machine using the template provided by Nagios XI. Everything seemed to be working OK until I started adding the Windows systems. Now the ndo2db issue is happening on the new VM as well.

The service checks are just a duplicate of the ones created by the wizard but one unique aspect is that instead of the password for the account listed in the service, I placed it inside the resource.cfg and the password itself contains an exclamation mark.

Can you let me know if either of those things are a known issue?
Locked