Nagios XI - XI Server Considerations

Overview

This guide on Best Practices is about the Nagios XI server and different considerations you can take in relation to it's configuration.

XI Server Considerations

Date and Timezone

Make sure your XI server has its timezone correctly defined.

Configure Timezone
Admin > Manage System Config
Documentation - Changing The System Time

When using NTP make sure it's synchronized with a trusted time source like pool.ntp.org.

If this is a virtual machine, don't sync its time with the hypervisor (ESXi, HyperV etc.)!

Can be the source of confusing problems such as:
- Apply configuration throwing errors
- Performance data not being processed

CPU

Having the right amount of CPU cores is important but so too is the speed of those cores. Not all plugins and processes are multi-threaded, so a higher speed CPU is going to benefit. A 3.4GHz CPU will do a lot more than a 2.2GHz one.

Refer to the XI Hardware Requirements guide.

Memory

How much memory do you need on an XI system? When all the hosts and services in XI are healthy, the amount of memory used is far less compared to a major system outage. When XI fires off event handlers they consume memory, if there is a major outage and a lot of event handlers are being executed, a lot of memory is being consumed. It doesn’t take long for 6GB of memory to be used.

Generally speaking you should have at least 50% more memory than needed.

Refer to the XI Hardware Requirements guide.

RAM Disk

Configuring Nagios XI with a RAM Disk is highly recommended as the number of monitored objects increase. The more things you are monitoring the more disk I/O occurs. By directing this traffic a RAM Disk, the time it takes for that I/O operation to complete is drastically faster.

Reduces disk I/O & load
Speeds up processing of performance data
Speeds up processing of spooled check results
Speeds up nagios restarts

Refer to the XI RAM Disk procedure.

Solid State Disk (SSD)

Greatly improves overall performance

Compliments RAM Disk
Helps read/writes with:
Logs
Database
Performance Graphs
Reports

SSD vs RAID ?

RAID allows for much larger disk capacities than SSD can provide, however it would be very hard for a spinning disk RAID set to beat the performance of SSD.

Keep in mind if you implement SSD you should implement RAID1 sets for redundancy purposes.

rrdcached

rrdcached is a way of accumulating the received performance data and then processing it in a batch job. It helps with larger installations and can reduce I/O, however it can also result with performance graphs lagging behind the realtime results.

Refer to the XI rrdcached procedure.

Offloaded MySQL / MariaDB

On larger installations there can be a lot more data being written to the databases, which in turn can result in a lot of CPU usage directed away from actual monitoring.

Offloading to a separate server will remove this CPU usage from your monitoring server.

Of course, make sure you monitor the offloaded server!

Disk / CPU / Memory / Tables / Service
Refer to the guide Monitoring the Nagios XI "localhost" for notes about services

Refer to the XI Offloaded DB procedure.

Disaster Recovery

Define is what is import to you in a disaster. Once you have clearly defined goals and outcomes you can plan appropriately and test.

These presentations cover DR options:

Andy Brist: High Availability and Failover Solutions for Nagios XI
- https://www.youtube.com/watch?v=KW5Qkl8brcA
Jeremy Rust & Devin Vance: Scaling Across Data Centers Using High Availability
- https://www.youtube.com/watch?v=EVMbTbh9zV4

Backups

Have you scheduled your backups in Nagios XI?

Admin > System Backups
Schedule backups of XI
- Location can be local, FTP, SSH
- Remote location recommended. Storing them on storage that is not local to the XI file system is important - make sure you can get to your backups if your XI server dies.
Manual Backups
- Local Backup Archives via Admin menu
- ```
/usr/local/nagiosxi/scripts/backup_xi.sh
```

Restoring Backups

The backup and restore procedure is very straight-forward and allows for a full recovery of your Nagios XI system.

Another good use of it is to migrate XI from one server to another.

Refer to the XI Backup and Restore procedure.

Final Thoughts

For any support related questions please visit the Nagios Support Forums at:

http://support.nagios.com/forum/