Bulk Discovery

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Locked
rexmundo
Posts: 29
Joined: Mon Jul 20, 2020 3:28 pm

Bulk Discovery

Post by rexmundo »

Hi

We are looking to deploy NCPA agent on thousands of hosts. I know its possible to do autodiscovery on a single host and then say "Do the same for the following X hosts" and list the hosts you want Nagios to onboard.

The problem is that servers are not all the same. Some have C: and D: drive, some C: and E: etc. Can Nagios XI do autodiscovery in bulk? give it 1000 hosts and for each host monitor the CPU/RAM and Disk drives?

Thanks!
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: Bulk Discovery

Post by dchurch »

The Auto-Discovery feature is limited to monitoring "external facing" metrics, such as "is port 22 open" and "what's the RTT for a ping to this server" - it's not possible to make it auto-configure checks to get more "private" data such as RAM usage and Disk space free.

In the next version of Nagios XI, there will be more options, with a "Deploy Agent" wizard being developed which may lend itself better to this scenario.

The best routes at this time are as follows:

Option 1: Host Groups

What you could do is add all your hosts with a D drive to a Host Group called "With Drive D" and assign the active NCPA check "Disk Usage on Drive D" to that group (don't assign it individually to each host, only to the host group), then do the same for the rest of the drive letters.

The check would look something like this:

Code: Select all

/usr/local/nagios/libexec/check_ncpa.py -H 127.0.0.1 -t 'some-shared-token-bla-blu-blarg' -P 5693 -M 'disk/logical/D:|' -w '70' -c '90'
Then on each of the monitored machines:

Code: Select all

# C:\Program Files (x86)\Nagios\NCPA\etc\ncpa.cfg (Windows)
# or /usr/local/ncpa/etc/ncpa.cfg (Linux)
[api]
community_string =some-shared-token-bla-blu-blarg
This necessitates having the community_string (ncpa.cfg) being the same on each of the monitored hosts.

The advantage of this is that it centralizes the setup and management. If a host drops their Drive D:, simply remove them from the "With Drive D" group, and they'll no longer receive that check.

Option 2: Passive Checks

Another option is to use passive checks and have Nagios XI auto-setup hosts when it sees them.

One caveat here is if the Nagios XI server ever changes network location, each monitored host's configuration will need to be updated to send the passive check data to a new location.

The advantage here is that it's very easy to add new hosts, with almost no administration needed on the Nagios XI side of things.

Here's an example of how you'd do it. (Edit the config on the monitored Windows host)

Code: Select all

# C:\Program Files (x86)\Nagios\NCPA\etc\ncpa.cfg (Windows)
# or /usr/local/ncpa/etc/ncpa.cfg (Linux)
[nrdp]
parent = http://my-ip-of-nagios-xi-server/nrdp
token = some-shared-token-bla-blu-blarg
hostname = 192.168.x.x # set this to my (Windows) IP address

Code: Select all

# C:\Program Files (x86)\Nagios\NCPA\etc\ncpa.cfg.d\nrdp.cfg (Windows)
# or /usr/local/ncpa/etc/ncpa.cfg.d/nrdp.cfg (Linux)
[passive checks]
%HOSTNAME%|__HOST__ = system/agent_version
%HOSTNAME%|Disk Usage = disk/logical/C:|/used_percent --warning 80 --critical 90 --units Gi
%HOSTNAME%|CPU Usage = cpu/percent --warning 60 --critical 80 --aggregate avg
%HOSTNAME%|Swap Usage = memory/swap --warning 60 --critical 80 --units Gi
%HOSTNAME%|Memory Usage = memory/virtual --warning 80 --critical 90 --units Gi
%HOSTNAME%|Process Count = processes --warning 300 --critical 400
Option 3: Hybrid

What you could do is use active checks for the common things like CPU Usage and RAM Usage, and Drive C usage. See Option 1 for a basic overview about simplifying the setup of active checks by applying them to entire host groups.

In this case, you'd make ONE host group in this case called "Windows Hosts" or something, add ALL your hosts to them, then add the common active checks CPU Usage and RAM Usage, and Drive C usage to that host group (don't add them to the individual hosts, only to the host group). This necessitates having the community_string (ncpa.cfg) being the same on each of the monitored hosts.

Then, for each of the monitored hosts with a D drive, configure a passive check on the monitored host to send that to Nagios XI.

It would look something like this (edit the config on the monitored Windows host):

Code: Select all

# C:\Program Files (x86)\Nagios\NCPA\etc\ncpa.cfg (Windows)
# or /usr/local/ncpa/etc/nrdp.cfg (Linux)
[nrdp]
parent = http://my-ip-of-nagios-xi-server/nrdp
token = some-shared-token-bla-blu-blarg
hostname = 192.168.x.x # set this to my (Windows) IP address

Code: Select all

# C:\Program Files (x86)\Nagios\NCPA\etc\ncpa.cfg.d\nrdp.cfg (Windows)
# or /usr/local/ncpa/etc/ncpa.cfg.d/nrdp.cfg (Linux)
[passive checks]
%HOSTNAME%|__HOST__ = system/agent_version
%HOSTNAME%|Disk Usage = disk/logical/D:|/used_percent --warning 80 --critical 90 --units Gi
Notes

If you use active checks, the hostname both in the passive check (hostname in ncpa.cfg) and the Nagios XI host definition should match up to where Nagios XI needs to go for an active check (read: the IP address of the monitored host).
Last edited by dchurch on Thu Nov 12, 2020 11:14 am, edited 1 time in total.
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
rexmundo
Posts: 29
Joined: Mon Jul 20, 2020 3:28 pm

Re: Bulk Discovery

Post by rexmundo »

Thanks @dchurch for the detailed response!

I understand the options.. They all unfortunately need some data gathering , especially option 1.

As you are suggesting using Host Groups, what are the performance implications of putting 1000's of hosts in a single host group?

Thanks
George
dchurch
Posts: 858
Joined: Wed Oct 07, 2020 12:46 pm
Location: Yo mama

Re: Bulk Discovery

Post by dchurch »

There really isn't a performance implication of having 1000 hosts in a host group, vs. individually configuring 1000 hosts. Nagios XI will still be sending out and processing 1000's of active checks every N minutes. Host groups are just an abstraction provided by Nagios XI to quickly tell NagiosCore about many hosts with the same configuration.

With passive checks in, there will still be a pretty hefty load on the Nagios XI server since it'll still have to process 1000's of check results every N minutes.

If you want to monitor CPU, RAM, Swap, and 2 drives for each of 1000 hosts, that's 5000 check results for Nagios to process every N minutes.

At least if you go with Option 1 you can dial it back easier, and set the check interval to, say 10 minutes from 5 minutes (default) if you find your XI server is overloading. Dialing it back using Option 2 or Option 3 would involve going in and editing ncpa.cfg on each of the machines to send the passive checks less often.
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.

Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
Locked