Page 1 of 1

High End Server with RAID + SSD

Posted: Fri Jul 13, 2012 10:37 am
by HessianKnight
Hi Hardware-Tweakers,

we run a Nagios 3 with +3000 checks, this sends the check latency to the sky. Its due to the disk access, I was told in this forum. I was able to relocate the status file to a RAM disk but the latency still sucks. :(
Now we are going to build a 19" server with SSDs for that purpose.

Did anyone manage to do that and he s/he has any hints for us?

Our requirements are:
- Linux Debian Testing branch
- High availability --> 2 Disks with RAID1 Controller (mirroring)
- 19" 1HE case with front access to the disks.
- 2 power supplies for redundancy
- relatively low-budget

Especially I am not sure about this:
- Which RAID controller for Linux (software RAID is nice but we have decided for a controller card, in case of failure the disk must be able to be replaced by a Window$-Admin, sorry guys...)
- We will use consumer SATA SSDs because we dont want to spend +1000$ on a single SAS SSD.
- which case to choose with mountable SATA 2.5" disks for quick replacement
- RAID with SSD means no TRIM, is that true with every controller?
- probably a single cpu with 2 or 4 cores should be more than enough.

Thanks!
Hessian Knight

Re: High End Server with RAID + SSD

Posted: Sun Jul 15, 2012 7:36 pm
by jsmurphy
whoa, SSD's are a little bit overkill for ~3000 checks... we currently run ~11,000 checks on 4gb ram, 2 procs and 15k disks on virtual infrastructure without using RAM disk. As long as you aren't using 7.2k disks I think all you need is a little bit of performance tuning.

The best article(s) are mike's awesome guides on improving performance: http://labs.nagios.com/2012/01/30/nagio ... g-disk-io/ and http://labs.nagios.com/2011/11/15/nagio ... periments/

The highlights reel:
1. Offload MySQL to a different server if you are using NDOUtils, etc.
2. Don't use the VMware/check_esx3.pl plugin.
3. Use Ram disk for status.dat, objects.cache, host-perfdata, and service-perfdata
4. Enable rrdcached.
5. Enable large installation tweaks in the nagios config file.

Outside of that if you are SNMP trap heavy and using SNMPTT I strongly recommend running it in daemon mode instead of standalone. If none of the above helps I can't really offer much advice on storage as it's not really my area of expertise, but hopefully you won't need to go that far :).

Re: High End Server with RAID + SSD

Posted: Mon Jul 23, 2012 7:01 am
by HessianKnight
Hi JsMurphy,

- not using NDOutils or mySQL

- RamDisk only used for status.dat + objects.cache

I am a bit afraid of using Ramdisks for we use the PNP4Nagios data and Nagios status data for essential business reporting, we cant afford to lose such data in case of OS
failure or a kernel panic. Of course we could have a cronjob backing the data up regularly but there is still an time interval thats never backed up.

- Enabled RRDCACHED with 10 min update time, this lowered the latency from 120s to the half. Thanks for the hint!

- Large installation tweaks already implemented.

3000 Checks, nearly all runnning with PNP4nagios, the Check Latency is about 60 sec. This still sucks. Going to get SSDs + physical server instead of running under VMware ESX3, already tweaked this, tried less / more cpus, reservations etc... the latency still stays high no matter what.

Re: High End Server with RAID + SSD

Posted: Mon Jul 23, 2012 8:45 pm
by jsmurphy
Ahhh ESX 3, when I first deployed Nagios we were running ESX 3.5 with Nagios 2.6 or 2.7 and it suffered pretty chronically from performance problems (though it was also pretty loosely optimized) with roughly the same amount of monitoring as you have now so it's starting to make a bit more sense.

I can't really offer much help in the way of advice on SSD and RAID configuration but it should straight up kill any performance issue you were having :lol: