Is there a document that would describe what components of the entire nagios service should be monitored in order to ensure that nagios is functioning?
We ran into a situation the other day after a configuration apply the ndo service appeared to abend and went unnoticed for a fairly significant amount of time during which Nagios didn't execute any service/host checks.
[1338387007] ndomod: Error writing to data sink! Some output may get lost...
[1338387007] ndomod: Please check remote ndo2db log, database connection or SSL Parameters
I have setup some monitors on an alternate Nagios server to monitor our primary installation to include the standard checks
cpu,memory,disk,load,swap etc...
as well as to monitor the processes
nagios,ndo2db,gearmand,mysqld,npcd, mod_gearman_wor
log monitoring(check_logfiles) has also been configured against the nagios.log to alert on the above errors.
Are there any additional checks that could be used to indicate that nagios may not be operational. Ideally it would be nice to some how alert on the number of service/host checks nagios executed in a certain time period, i.e. 0 checks in five minutes raise an exception. Our environment is distributed(1 nagios xi/gearman server, 5 gearman worker servers, 1 mysql server) for monitoring the Windows environment. Another single nagios server is in production for Unix monitoring, this is where I monitor the previoulsy mentioned nagios environment from.
Thanks,
Paul
Monitoring the nagios service itself
Re: Monitoring the nagios service itself
appears there is a check_nagios plugin which looks like what I am probably looking for
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: Monitoring the nagios service itself
Yep, that's the one 