How to monitor an XI server disk space from another XI?
How to monitor an XI server disk space from another XI?
I tried to install NRPE on a Nagios server and it blew it up and took 2 days for support to help me get it running again.
What are the Nagios best practices for monitoring other Nagios XI servers? I have a few with limited disk space and I need to know when they are filling up. I monitor several remote Nagios XI servers with another Nagios XI server in a CoLo.
What are the Nagios best practices for monitoring other Nagios XI servers? I have a few with limited disk space and I need to know when they are filling up. I monitor several remote Nagios XI servers with another Nagios XI server in a CoLo.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: How to monitor an XI server disk space from another XI?
I would install the XI Linux Agent on the XI server and run the Linux Monitoring Wizard.
I would also run the Nagios XI monitoring wizard against the servers as well.
I would also run the Nagios XI monitoring wizard against the servers as well.
Re: How to monitor an XI server disk space from another XI?
The Linux agent is the NRPE agent that disabled my Nagios server for 2 days. Can you verify that this is the agent you wanted me to install? I just want to be sure before I try on any production machines as the last time I tried the install on a OVA Nagios image, it disabled the Nagios server entirely and took quite a bit of work from support to recover the server.
I did try this on a backup machine and Nagios does seem to be running still but I am getting a few errors with NRPE.
CHECK_NRPE: Error - Could not complete SSL handshake.
I did try this on a backup machine and Nagios does seem to be running still but I am getting a few errors with NRPE.
CHECK_NRPE: Error - Could not complete SSL handshake.
Re: How to monitor an XI server disk space from another XI?
This creates a separate service for each host, which gets unmanagable after a few minutes. A better approach is to use the wizard to make one service, make it generic, use hostgroups to add hosts, and templates to define host and service parameters.scottwilkerson wrote:I would install the XI Linux Agent on the XI server and run the Linux Monitoring Wizard.
I would also run the Nagios XI monitoring wizard against the servers as well.
The above is compact, easy to maintain and update.
-
scottwilkerson
- DevOps Engineer
- Posts: 19396
- Joined: Tue Nov 15, 2011 3:11 pm
- Location: Nagios Enterprises
- Contact:
Re: How to monitor an XI server disk space from another XI?
@gormank the OP is looking to monitor several XI servers. How many XI servers are you managing?gormank wrote:This creates a separate service for each host, which gets unmanagable after a few minutes. A better approach is to use the wizard to make one service, make it generic, use hostgroups to add hosts, and templates to define host and service parameters.scottwilkerson wrote:I would install the XI Linux Agent on the XI server and run the Linux Monitoring Wizard.
I would also run the Nagios XI monitoring wizard against the servers as well.
The above is compact, easy to maintain and update.
Re: How to monitor an XI server disk space from another XI?
An XI host is just another Linux host (other than the services related to Nagios and the DB) so I don't see the difference.
I currently have 4 pairs of NXI hosts w/ 3 more on the way. I also monitor Nagios instances between locations so Nagios itself is monitored.
Using templates and hostgroups to make a system modular is the Nagios core way, and XI wizards just allow a user to not learn a few simple techniques, which will surely become a problem as the number of hosts/services grow.
Doing things my way allows you to add a host and monitoring for it by simply creating the host and adding it to the right hostgroup.
I currently have 4 pairs of NXI hosts w/ 3 more on the way. I also monitor Nagios instances between locations so Nagios itself is monitored.
Using templates and hostgroups to make a system modular is the Nagios core way, and XI wizards just allow a user to not learn a few simple techniques, which will surely become a problem as the number of hosts/services grow.
Doing things my way allows you to add a host and monitoring for it by simply creating the host and adding it to the right hostgroup.
Re: How to monitor an XI server disk space from another XI?
Gomark, I am monitoring 8 Nagios XI hosts. How would I go about doing it your way? I am pretty good in getting Nagios up and running and tuning but we use wizards to monitor everything which sounds like a bad idea. Would you be willing to chat about this some more? I would be interested in engaging your services if you are interested.
Re: How to monitor an XI server disk space from another XI?
Also, I figured out the SSL handshake issue. Turns out that NRPE runs SSL on a non-standard port and my firewall was blocking it. All fixed now.
I have heard a lot of gurus mention a single check for multiple services but have not been able to find enough info to wrap my head around how that would work. Very interested in doing so though.
I have heard a lot of gurus mention a single check for multiple services but have not been able to find enough info to wrap my head around how that would work. Very interested in doing so though.
Re: How to monitor an XI server disk space from another XI?
Basically, you just create a hostgroup with all your, for example, Linux hosts.
Next create a service with the wizard, or copy an existing one, remove the host specific info, rename it since the names are usually related to the host, then apply the hostgroup above. Save and apply, and if there are no errors the new service will start on all hosts in the group.
To use a template, take the common info in the check settings and alert settings tabs and add it to a new template for the service. Ddd the template to the service and remove the seiingd now in the template from the service.
The same thing for hosts and templates.
Here's an example. Names and things have been changed.
Here's a host. Note that it has very little info in it because it uses the template default_host.
Here's the template sourced above. Note that a template looks very much like a host, except it isn't registered. Also note that this template uses the template base_host. The reason for multiple templates here is because I also have some other templates that source the base template to make things more generic.
Here's the base template. Note that it contains the host ping check.
Much the same as the example host, here's a sample FS check service. It uses a default template and a couple hostgroups. See how it only contains info about the service and its arguments, but no info about alerting, intervals, or notifications.
Here's the template and it also uses a base template.
The base service template.
I didn't provide a hostgroup example, but its pretty easy and hostgroups can contain hostgroups.
The point is I'm not duplicating info. If I want to change the host check to something other than ping, I can do so in one place. If I want to change the monitoring interval, check delay, number of retries, etc. its in one location and can be changed in seconds.
To add a host, all I do is add the host, apply the template, save, add the host to the correct hostgroup, save, and apply. Then its a matter of waiting a few minutes to verify the host and all its services are green.
Unfortunately, while changing services to a single instance is pretty easy, changing hosts to use templates is more difficult on a system with lots of hosts.
All of this lends itself to creating hosts by generating their configs with a list of hostnames and addresses using a script. I've literally configured monitoring for a system in an hour after NXI was running by importing config files to the import dir one at a time and running the reconfig script between each file in the correct order.
The newer way to do this is with the API, but I haven't gotten around to investigating it much.
Next create a service with the wizard, or copy an existing one, remove the host specific info, rename it since the names are usually related to the host, then apply the hostgroup above. Save and apply, and if there are no errors the new service will start on all hosts in the group.
To use a template, take the common info in the check settings and alert settings tabs and add it to a new template for the service. Ddd the template to the service and remove the seiingd now in the template from the service.
The same thing for hosts and templates.
Here's an example. Names and things have been changed.
Here's a host. Note that it has very little info in it because it uses the template default_host.
Code: Select all
define host {
host_name file001
use default_host
alias File Server (Backups)
address x.x.x.x
icon_image win_server.png
statusmap_image win_server.png
register 1
}
Code: Select all
define host {
name default_host
alias Template for most hosts
use base_host
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
contacts mail,sms
notification_interval 60
first_notification_delay 15
notification_options d,u,f,
notifications_enabled 1
register 0
}
Code: Select all
define host {
name base_host
alias Template containing notification and check intervals. Used by templates
check_command check_ping!3000.0,80%!5000.0,100%!!!!!!
max_check_attempts 4
check_interval 5
retry_interval 1
check_period 24x7
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_interval 60
notification_period 24x7
first_notification_delay 15
register 0
}Code: Select all
define service {
service_description FS_Unix_Usage
use default_service
hostgroup_name Linux_Physical,Linux_Virtual
display_name File system Usage
check_command check_nrpe!check_disk!-a "-w 20% -c 10%"!!!!!!
register 1
}
Code: Select all
define service {
name default_service
service_description default_service
display_name Template for most services
use base_service
active_checks_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
first_notification_delay 15
notification_options w,c,u,f,
notifications_enabled 1
contacts mail,sms
register 0
}Code: Select all
define service {
name base_service
service_description Base service sourced by others
display_name Base template for most templates
is_volatile 0
max_check_attempts 4
check_interval 5
retry_interval 1
active_checks_enabled 1
passive_checks_enabled 1
check_period 24x7
parallelize_check 1
obsess_over_service 0
check_freshness 0
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_interval 60
first_notification_delay 15
notification_period 24x7
notification_options w,c,u,
register 0
}The point is I'm not duplicating info. If I want to change the host check to something other than ping, I can do so in one place. If I want to change the monitoring interval, check delay, number of retries, etc. its in one location and can be changed in seconds.
To add a host, all I do is add the host, apply the template, save, add the host to the correct hostgroup, save, and apply. Then its a matter of waiting a few minutes to verify the host and all its services are green.
Unfortunately, while changing services to a single instance is pretty easy, changing hosts to use templates is more difficult on a system with lots of hosts.
All of this lends itself to creating hosts by generating their configs with a list of hostnames and addresses using a script. I've literally configured monitoring for a system in an hour after NXI was running by importing config files to the import dir one at a time and running the reconfig script between each file in the correct order.
The newer way to do this is with the API, but I haven't gotten around to investigating it much.
Re: How to monitor an XI server disk space from another XI?
Be sure to check out our Knowledgebase for helpful articles and solutions!