We have multiple data centers we'd like to monitor. Here's what we have right now:
- ONE Nagios XI VM monitoring our essential systems across ALL data centers
- ONE Nagios XI VM monitoring our core network across ALL data centers
- ONE Nagios Core VM (linux "load" of 25-30+!!) monitoring all CPEs across the entire network, with T1s groomed to all our various data centers, plus MPLS circuits.
Currently my idea is to have a physical nagios server at each location and monitor them all using the Nagios Fusion product
Question 1: Do I really need to separate out the networks and systems servers, or can they share a nagios installation? I feel like a R410 or R610 with sufficient CPU and RAM should be able to handle our core network and systems hosts, especially if we're installing one at EACH colo facility. The CPE monitoring system must have their own nagios boxes since they use heavily modified code and won't play nice with Nagios XI, and we can't even consider bringing all three functions into one box per colo.
Concept:
Colo A:
Dell R410 - Nagios XI, monitors systems and networks living at Colo A
Dell Rx10 - Nagios Core, monitors client CPE networks (T1, tunnel, or MPLS) groomed to Colo A
Colo B:
Dell R410 - Nagios XI, monitors systems and networks living at Colo B
Dell Rx10 - Nagios Core, monitors client CPE networks (T1, tunnel, or MPLS) groomed to Colo B (This colo is MOSTLY MPLS and not physical T1s. Unlikely to stop growing in size)
Colo C:
Dell R410 - Nagios XI, monitors systems and networks living at Colo C
Dell R610 - Nagios Core, monitoring MPLS only (no point to point T1s live here) *AND* hosting Nagios Fusion to tie all other sites together.
On top of this monitoring, DNX would be implemented on specially provisioned VMs as needed. Idea being that if a box is getting bogged down, we turn up some DNX vm nodes and let them start taking on some work.
Each Dell R410 or 610 will be using 4 10k 600GB drives in RAID 10, dual CPU, with at least 8 GB of RAM. I'm looking for critique and help with any necessary redesign of this design scheme, as this is my first time building out a distributed Nagios implementation. Is the hardware adequate? Is more CPU prudent? Will disk I/O be an issue and therefore would it make sense to use 15k drives?
Additionally, very important: Is there any way to use this set up to handle fail over? Clearly we can monitor each Nagios box from the other nagios boxes, and Nagios fusion itself will tell us when a box isn't responding, but is there any way to make sure that, say Colo A's nagios kicks in if Colo B's nagios dies? *OR*, can we set up two Identical nagios servers and fail over to the hot spare if the primary dies?
I appreciate any and all help. Diagrams would be immensely useful from those of you who have already implemented similar systems.

Thank you all SO MUCH!