How to monitor an XI server disk space from another XI?

dfmco · Post by **dfmco** » Fri Sep 08, 2017 10:31 pm

I tried to install NRPE on a Nagios server and it blew it up and took 2 days for support to help me get it running again.

What are the Nagios best practices for monitoring other Nagios XI servers? I have a few with limited disk space and I need to know when they are filling up. I monitor several remote Nagios XI servers with another Nagios XI server in a CoLo.

scottwilkerson · Post by **scottwilkerson** » Mon Sep 11, 2017 8:41 am

I would install the XI Linux Agent on the XI server and run the Linux Monitoring Wizard.

I would also run the Nagios XI monitoring wizard against the servers as well.

dfmco · Post by **dfmco** » Mon Sep 11, 2017 9:15 am

The Linux agent is the NRPE agent that disabled my Nagios server for 2 days. Can you verify that this is the agent you wanted me to install? I just want to be sure before I try on any production machines as the last time I tried the install on a OVA Nagios image, it disabled the Nagios server entirely and took quite a bit of work from support to recover the server.

I did try this on a backup machine and Nagios does seem to be running still but I am getting a few errors with NRPE.

CHECK_NRPE: Error - Could not complete SSL handshake.

gormank · Post by **gormank** » Mon Sep 11, 2017 10:21 am

scottwilkerson wrote:I would install the XI Linux Agent on the XI server and run the Linux Monitoring Wizard.

I would also run the Nagios XI monitoring wizard against the servers as well.

This creates a separate service for each host, which gets unmanagable after a few minutes. A better approach is to use the wizard to make one service, make it generic, use hostgroups to add hosts, and templates to define host and service parameters.

The above is compact, easy to maintain and update.

scottwilkerson · Post by **scottwilkerson** » Mon Sep 11, 2017 10:36 am

gormank wrote:
scottwilkerson wrote:I would install the XI Linux Agent on the XI server and run the Linux Monitoring Wizard.

I would also run the Nagios XI monitoring wizard against the servers as well.
This creates a separate service for each host, which gets unmanagable after a few minutes. A better approach is to use the wizard to make one service, make it generic, use hostgroups to add hosts, and templates to define host and service parameters.

The above is compact, easy to maintain and update.

@gormank the OP is looking to monitor several XI servers. How many XI servers are you managing?

gormank · Post by **gormank** » Mon Sep 11, 2017 11:03 am

An XI host is just another Linux host (other than the services related to Nagios and the DB) so I don't see the difference.
I currently have 4 pairs of NXI hosts w/ 3 more on the way. I also monitor Nagios instances between locations so Nagios itself is monitored.
Using templates and hostgroups to make a system modular is the Nagios core way, and XI wizards just allow a user to not learn a few simple techniques, which will surely become a problem as the number of hosts/services grow.
Doing things my way allows you to add a host and monitoring for it by simply creating the host and adding it to the right hostgroup.

dfmco · Post by **dfmco** » Mon Sep 11, 2017 11:31 am

Gomark, I am monitoring 8 Nagios XI hosts. How would I go about doing it your way? I am pretty good in getting Nagios up and running and tuning but we use wizards to monitor everything which sounds like a bad idea. Would you be willing to chat about this some more? I would be interested in engaging your services if you are interested.

dfmco · Post by **dfmco** » Mon Sep 11, 2017 11:33 am

Also, I figured out the SSL handshake issue. Turns out that NRPE runs SSL on a non-standard port and my firewall was blocking it. All fixed now.

I have heard a lot of gurus mention a single check for multiple services but have not been able to find enough info to wrap my head around how that would work. Very interested in doing so though.

gormank · Post by **gormank** » Mon Sep 11, 2017 12:52 pm

Basically, you just create a hostgroup with all your, for example, Linux hosts.
Next create a service with the wizard, or copy an existing one, remove the host specific info, rename it since the names are usually related to the host, then apply the hostgroup above. Save and apply, and if there are no errors the new service will start on all hosts in the group.
To use a template, take the common info in the check settings and alert settings tabs and add it to a new template for the service. Ddd the template to the service and remove the seiingd now in the template from the service.
The same thing for hosts and templates.

Here's an example. Names and things have been changed.

Here's a host. Note that it has very little info in it because it uses the template default_host.

Code: Select all

define host {
        host_name                  file001
        use                             default_host
        alias                           File Server (Backups)
        address                         x.x.x.x
        icon_image                      win_server.png
        statusmap_image                 win_server.png
        register                        1
        }

Here's the template sourced above. Note that a template looks very much like a host, except it isn't registered. Also note that this template uses the template base_host. The reason for multiple templates here is because I also have some other templates that source the base template to make things more generic.

Code: Select all

define host {
       name                                     default_host
       alias                                    Template for most hosts
       use                                      base_host
       flap_detection_enabled                   1
       process_perf_data                        1
       retain_status_information                1
       retain_nonstatus_information             1
       contacts                                 mail,sms
       notification_interval                    60
       first_notification_delay                 15
       notification_options                     d,u,f,
       notifications_enabled                    1
       register                                 0
}

Here's the base template. Note that it contains the host ping check.

Code: Select all

define host {
       name                                     base_host
       alias                                    Template containing notification and check intervals. Used by templates
       check_command                            check_ping!3000.0,80%!5000.0,100%!!!!!!
       max_check_attempts                       4
       check_interval                           5
       retry_interval                           1
       check_period                             24x7
       event_handler_enabled                    1
       flap_detection_enabled                   1
       process_perf_data                        1
       retain_status_information                1
       retain_nonstatus_information             1
       notification_interval                    60
       notification_period                      24x7
       first_notification_delay                 15
       register                                 0

}

Much the same as the example host, here's a sample FS check service. It uses a default template and a couple hostgroups. See how it only contains info about the service and its arguments, but no info about alerting, intervals, or notifications.

Code: Select all

define service {
        service_description             FS_Unix_Usage
        use                             default_service
        hostgroup_name                  Linux_Physical,Linux_Virtual
        display_name                    File system Usage
        check_command                   check_nrpe!check_disk!-a "-w 20% -c 10%"!!!!!!
        register                        1
        }

Here's the template and it also uses a base template.

Code: Select all

define service {
       name                                     default_service
       service_description                      default_service
       display_name                             Template for most services
       use                                      base_service
       active_checks_enabled                    1
       process_perf_data                        1
       retain_status_information                1
       retain_nonstatus_information             1
       first_notification_delay                 15
       notification_options                     w,c,u,f,
       notifications_enabled                    1
       contacts                                 mail,sms
       register                                 0
}

The base service template.

Code: Select all

define service {
       name                                     base_service
       service_description                      Base service sourced by others
       display_name                             Base template for most templates
       is_volatile                              0
       max_check_attempts                       4
       check_interval                           5
       retry_interval                           1
       active_checks_enabled                    1
       passive_checks_enabled                   1
       check_period                             24x7
       parallelize_check                        1
       obsess_over_service                      0
       check_freshness                          0
       event_handler_enabled                    1
       flap_detection_enabled                   1
       process_perf_data                        1
       retain_status_information                1
       retain_nonstatus_information             1
       notification_interval                    60
       first_notification_delay                 15
       notification_period                      24x7
       notification_options                     w,c,u,
       register                                 0
}

I didn't provide a hostgroup example, but its pretty easy and hostgroups can contain hostgroups.

The point is I'm not duplicating info. If I want to change the host check to something other than ping, I can do so in one place. If I want to change the monitoring interval, check delay, number of retries, etc. its in one location and can be changed in seconds.

To add a host, all I do is add the host, apply the template, save, add the host to the correct hostgroup, save, and apply. Then its a matter of waiting a few minutes to verify the host and all its services are green.

Unfortunately, while changing services to a single instance is pretty easy, changing hosts to use templates is more difficult on a system with lots of hosts.

All of this lends itself to creating hosts by generating their configs with a list of hostnames and addresses using a script. I've literally configured monitoring for a system in an hour after NXI was running by importing config files to the import dir one at a time and running the reconfig script between each file in the correct order.

The newer way to do this is with the API, but I haven't gotten around to investigating it much.

Post by **tgriep** » Mon Sep 11, 2017 4:13 pm

Thanks@ gormank for the help.
@dfmco, it you have ant further questions, let us know.

Nagios Support Forum

How to monitor an XI server disk space from another XI?

How to monitor an XI server disk space from another XI?

Re: How to monitor an XI server disk space from another XI?

Re: How to monitor an XI server disk space from another XI?

Re: How to monitor an XI server disk space from another XI?

Re: How to monitor an XI server disk space from another XI?

Re: How to monitor an XI server disk space from another XI?

Re: How to monitor an XI server disk space from another XI?

Re: How to monitor an XI server disk space from another XI?

Re: How to monitor an XI server disk space from another XI?

Re: How to monitor an XI server disk space from another XI?