Odd behaviors w/ using ram disk.
Odd behaviors w/ using ram disk.
Hello All,
I moved nagios.cmd, status.dat and objects.cache to a ramdisk to test out performance, after which I am able to get amazing performance (well over 4000 checks processed in a little under a minute). But I am having one small annoying issue.
On reboot, before nagios starts I copy the backup files of my ram disk back into ram; then nagios start. But every time nagios comes up on reboot my host checks are all disabled. If I restart nagios after booting up all the host checks are become enabled again....
Nothing specific in the log, during this; just the normal nagios startup output.
Any ideas or any further info needed?
I moved nagios.cmd, status.dat and objects.cache to a ramdisk to test out performance, after which I am able to get amazing performance (well over 4000 checks processed in a little under a minute). But I am having one small annoying issue.
On reboot, before nagios starts I copy the backup files of my ram disk back into ram; then nagios start. But every time nagios comes up on reboot my host checks are all disabled. If I restart nagios after booting up all the host checks are become enabled again....
Nothing specific in the log, during this; just the normal nagios startup output.
Any ideas or any further info needed?
Re: Odd behaviors w/ using ram disk.
actually.. problem cleared up I think.... I added a link to the ramdisk location of nagios.cmd to the default nagios.cmd location. That appears to have cleared it up.
Re: Odd behaviors w/ using ram disk.
Thanks for posting the issue and resolution! Thats very helpful for people who may be experiencing a similar issue.
Nicholas Scott
Former Nagios employee
Former Nagios employee
Re: Odd behaviors w/ using ram disk.
ok, I going to bump this thread back up... The issue still persists. I thought I had cleared it up with the nagios.cmd link... but it's still an issue.
Here is what I have:
CentOS 5.5 64bit
nagiosxi 2011r1.3
256mb ramdisk, that holds all of /usr/local/nagios/var
rsync script that syncs the ram disk... to disk. (once a minute)
startup script that dumps the disk backup back to ram.
issue, on server reboot, all host checks are disabled and half the service checks dont show up in the count of the all service checks status. This issue persists until nagios is restarted. Once I restart nagios all the host/service checks start reactivating and everything starts working like it should. The only issue I see in the log, on the intial start of nagios, ndo fails to initially connect to data sink. but when ndo restries.. it does connect. When I restart nagios after the reboot it connects to the data sync first try.
Last, it appears in nagios core, everything is fine these checks are actually processing its just that nagiosxi does not pick them up.
Startup log entries...
entries after restarting nagios...
Any ideas or what further information is needed?
Here is what I have:
CentOS 5.5 64bit
nagiosxi 2011r1.3
256mb ramdisk, that holds all of /usr/local/nagios/var
rsync script that syncs the ram disk... to disk. (once a minute)
startup script that dumps the disk backup back to ram.
issue, on server reboot, all host checks are disabled and half the service checks dont show up in the count of the all service checks status. This issue persists until nagios is restarted. Once I restart nagios all the host/service checks start reactivating and everything starts working like it should. The only issue I see in the log, on the intial start of nagios, ndo fails to initially connect to data sink. but when ndo restries.. it does connect. When I restart nagios after the reboot it connects to the data sync first try.
Last, it appears in nagios core, everything is fine these checks are actually processing its just that nagiosxi does not pick them up.
Startup log entries...
Code: Select all
[1309464197] Nagios 3.2.3 starting... (PID=4143)
[1309464197] Local time is Thu Jun 30 15:03:17 CDT 2011
[1309464197] LOG VERSION: 2.0
[1309464197] Event broker module '/usr/local/nagios/lib/dnxPlugin.so' initialized successfully.
[1309464197] ndomod: NDOMOD 1.4b9 (10-27-2009) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1309464197] ndomod: Could not open data sink! I'll keep trying, but some output may get lost...
[1309464197] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1309464198] Finished daemonizing... (New PID=4144)
[1309464213] ndomod: Successfully connected to data sink. 21701 items lost, 5000 queued items to flush.
[1309464218] ndomod: Successfully flushed 5000 queued items to data sink.
Code: Select all
[1309464519] Nagios 3.2.3 starting... (PID=7193)
[1309464519] Local time is Thu Jun 30 15:08:39 CDT 2011
[1309464519] LOG VERSION: 2.0
[1309464519] Event broker module '/usr/local/nagios/lib/dnxPlugin.so' initialized successfully.
[1309464519] ndomod: NDOMOD 1.4b9 (10-27-2009) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1309464519] ndomod: Successfully connected to data sink. 0 queued items to flush.
[1309464519] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1309464520] Finished daemonizing... (New PID=7195)
Re: Odd behaviors w/ using ram disk.
in the meantime I have a cron hack on reboot that kills nagios after 120 secnds and then restarts the service... not very elegant but it works.
Re: Odd behaviors w/ using ram disk.
From the looks of those logs it looks like Nagios isn't able to contact the MySQL database and just dumps the information. Do you have a custom database setup or are you using the stock MySQL on the same Nagios server setup?
Nicholas Scott
Former Nagios employee
Former Nagios employee
Re: Odd behaviors w/ using ram disk.
stock database, but it runs on a second server.nscott wrote:From the looks of those logs it looks like Nagios isn't able to contact the MySQL database and just dumps the information. Do you have a custom database setup or are you using the stock MySQL on the same Nagios server setup?
Also there are no issues with connecting to the database, I show at any given time around 34-35 connections by nagiosql and ndoutils to the database.
Re: Odd behaviors w/ using ram disk.
Code: Select all
[1309464197] ndomod: Could not open data sink! I'll keep trying, but some output may get lost...
...snip...
[1309464213] ndomod: Successfully connected to data sink. 21701 items lost, 5000 queued items to flush.In order to actually address the problem could you enable debugging in ndo2db.cfg ? Set the debug_level = -1 and then watch the debug_file (defaults to nagios/var/ndo2db.debug).
I do find that this bug is related to the RAM disk very interesting, and perhaps this isn't the main issue, but its the clearest I see to follow.
Also, if you're running that many checks, there is a section in the Maximizing Performance in XI where it talks about the reaper frequency. You will definitely need to turn that down. This may also be a source of your problem if you haven't adjusted it already.
http://assets.nagios.com/downloads/nagi ... rmance.pdf
Nicholas Scott
Former Nagios employee
Former Nagios employee
Re: Odd behaviors w/ using ram disk.
Changed the reap settings and enabled debugging for ndo, The only thing I get from that is is the sql insert statements... there is no other logs other then those in the ndo2db.debug.
Status is the same...
Also I think you are correct this isn't related to the ramdisk... I removed the ram disk rebooted and the same thing happens.
Status is the same...
Also I think you are correct this isn't related to the ramdisk... I removed the ram disk rebooted and the same thing happens.
Re: Odd behaviors w/ using ram disk.
1.3 did have an issue where on rare occasions it could spawn a second instance of nagios, which would result in an error like that. I'd recommend upgrading to 1.5 if you haven't already. I would also run our DB repair procedure if you haven't already just to verify that there isn't any table corruption.
http://assets.nagios.com/downloads/nagi ... irdatabase
Is your RAM disk large enough for your status and objects file?
http://assets.nagios.com/downloads/nagi ... irdatabase
Is your RAM disk large enough for your status and objects file?