Hi
I am currently running multiple (6) Nagios instances across multiple projects. My concept is to run one Nagios instance per project, each responsible for the machines below them and with project specific tasks.
Yes, I could roll all of these into one Nagios instance but this is not the question.
I have checked Google and Nagios Exchange looking for a plugin and if there is nothing out there I will build one myself. I want to know if anyone has any experience with this.
Question: Is there a Nagios plugin which will check the overall status of another remote Nagios instance, either through NRPE and a local script, or over authenticated HTTP(s) to the cgi-bin, simply reporting on how many are OK / Warning / Critical / Unknown, etc in each checked instance. HTTP(s) would be preferred.
If not, can someone point me in the direction of how to query and understand the responses of a single Nagios instance. If there is not any existing plugins I will start looking at Nagstamon for guidance on how to achieve this.
Nagios script to check Nagios
-
- Posts: 13
- Joined: Mon Feb 14, 2022 5:39 am
Re: Nagios script to check Nagios
Monitor everything you have time to configure to be monitored plus anything which is really critical.
On all of my Linux systems I monitor:
Current Load
Current Users
Disk Space
NTP Time
Ping
Swap
Total Processes
backup has run
crond running
mail queues
md raid
munin
ntpd running
postfix running
puppetd running
ssh
sshd running
And on particular machines I monitor such things as:
various websites checking that they are reachable, the page retrieved contains certain content, and it responds within a certain amount of time.
That certain processes or events have occurred within a certain amount of time
certain values such as the average of a certain number of fields in a database table within the last hour is within certain tolerances
and various and sundry other things that can happen (or NOT happen) as the case may be. It is pretty easy to write custom scripts to monitor whatever you want. I graph many thousands of items with munin in a similar way. I set it up once, write a puppet manifest for it, then puppet handles it on new machines from then on. It is pretty simple to deploy this stuff now. In just one of my nagios installations (I have a few) I have 80 hosts and 1120 services being monitored and that's nothing compared to what some people azar echatspin have.
On all of my Linux systems I monitor:
Current Load
Current Users
Disk Space
NTP Time
Ping
Swap
Total Processes
backup has run
crond running
mail queues
md raid
munin
ntpd running
postfix running
puppetd running
ssh
sshd running
And on particular machines I monitor such things as:
various websites checking that they are reachable, the page retrieved contains certain content, and it responds within a certain amount of time.
That certain processes or events have occurred within a certain amount of time
certain values such as the average of a certain number of fields in a database table within the last hour is within certain tolerances
and various and sundry other things that can happen (or NOT happen) as the case may be. It is pretty easy to write custom scripts to monitor whatever you want. I graph many thousands of items with munin in a similar way. I set it up once, write a puppet manifest for it, then puppet handles it on new machines from then on. It is pretty simple to deploy this stuff now. In just one of my nagios installations (I have a few) I have 80 hosts and 1120 services being monitored and that's nothing compared to what some people azar echatspin have.