Distributed systems monitoring
Posted: Tue Jul 09, 2013 2:45 pm
I have already configured nagios to monitor several of our hosts that behave normally, but there are some that behave slightly awkwardly:
The way the current system is set up to be monitored, we have several hosts, say A-F, who are all monitoring hosts 0-50. At the moment, they themselves submit passive checks for their own services as well. Currently, with our primitive monitoring system, we receive notification from each when one of the hosts' services are down--that is we know if for example host C is reporting on a failure of service TEST from host 2. Theoretically this is to check back which host is sending anomalous reports. While this format is probably not ideal, I am (at the moment) looking to mirror this design into nagios. Distributed monitoring allows for multiple monitoring servers to report to a central server, essentially sending the OC SCP passive check "on behalf" of the monitored server, but I am looking for a way to consolidate information on both the extended host and the middleman on the web interface if possible. In short, I am looking for some awkward combination of both parent/child host relationships and distributed monitoring systems.
So the question is, is there an easy way for nagios to do this? I can design several nasty ways of achieving the goals of this server-monitoring migration, but if there is a simple Nagios plugin or something it would be helpful. With the way the interface looks, it doesn't look like we could get something like cascading trees of parent hosts that also function as distributed monitoring servers, but I would be glad to be proven wrong ( Maybe a way to get complex output formatting into NSCA passive checks ?)!
Any help/thoughts/pointers are appreciated; in the meantime, I will begin constructing these Frankensteins.
The way the current system is set up to be monitored, we have several hosts, say A-F, who are all monitoring hosts 0-50. At the moment, they themselves submit passive checks for their own services as well. Currently, with our primitive monitoring system, we receive notification from each when one of the hosts' services are down--that is we know if for example host C is reporting on a failure of service TEST from host 2. Theoretically this is to check back which host is sending anomalous reports. While this format is probably not ideal, I am (at the moment) looking to mirror this design into nagios. Distributed monitoring allows for multiple monitoring servers to report to a central server, essentially sending the OC SCP passive check "on behalf" of the monitored server, but I am looking for a way to consolidate information on both the extended host and the middleman on the web interface if possible. In short, I am looking for some awkward combination of both parent/child host relationships and distributed monitoring systems.
So the question is, is there an easy way for nagios to do this? I can design several nasty ways of achieving the goals of this server-monitoring migration, but if there is a simple Nagios plugin or something it would be helpful. With the way the interface looks, it doesn't look like we could get something like cascading trees of parent hosts that also function as distributed monitoring servers, but I would be glad to be proven wrong ( Maybe a way to get complex output formatting into NSCA passive checks ?)!
Any help/thoughts/pointers are appreciated; in the meantime, I will begin constructing these Frankensteins.