Hi
We are looking to deploy NCPA agent on thousands of hosts. I know its possible to do autodiscovery on a single host and then say "Do the same for the following X hosts" and list the hosts you want Nagios to onboard.
The problem is that servers are not all the same. Some have C: and D: drive, some C: and E: etc. Can Nagios XI do autodiscovery in bulk? give it 1000 hosts and for each host monitor the CPU/RAM and Disk drives?
Thanks!
Bulk Discovery
Re: Bulk Discovery
The Auto-Discovery feature is limited to monitoring "external facing" metrics, such as "is port 22 open" and "what's the RTT for a ping to this server" - it's not possible to make it auto-configure checks to get more "private" data such as RAM usage and Disk space free.
In the next version of Nagios XI, there will be more options, with a "Deploy Agent" wizard being developed which may lend itself better to this scenario.
The best routes at this time are as follows:
Option 1: Host Groups
What you could do is add all your hosts with a D drive to a Host Group called "With Drive D" and assign the active NCPA check "Disk Usage on Drive D" to that group (don't assign it individually to each host, only to the host group), then do the same for the rest of the drive letters.
The check would look something like this:
Then on each of the monitored machines:
This necessitates having the community_string (ncpa.cfg) being the same on each of the monitored hosts.
The advantage of this is that it centralizes the setup and management. If a host drops their Drive D:, simply remove them from the "With Drive D" group, and they'll no longer receive that check.
Option 2: Passive Checks
Another option is to use passive checks and have Nagios XI auto-setup hosts when it sees them.
One caveat here is if the Nagios XI server ever changes network location, each monitored host's configuration will need to be updated to send the passive check data to a new location.
The advantage here is that it's very easy to add new hosts, with almost no administration needed on the Nagios XI side of things.
Here's an example of how you'd do it. (Edit the config on the monitored Windows host)
Option 3: Hybrid
What you could do is use active checks for the common things like CPU Usage and RAM Usage, and Drive C usage. See Option 1 for a basic overview about simplifying the setup of active checks by applying them to entire host groups.
In this case, you'd make ONE host group in this case called "Windows Hosts" or something, add ALL your hosts to them, then add the common active checks CPU Usage and RAM Usage, and Drive C usage to that host group (don't add them to the individual hosts, only to the host group). This necessitates having the community_string (ncpa.cfg) being the same on each of the monitored hosts.
Then, for each of the monitored hosts with a D drive, configure a passive check on the monitored host to send that to Nagios XI.
It would look something like this (edit the config on the monitored Windows host):
Notes
If you use active checks, the hostname both in the passive check (hostname in ncpa.cfg) and the Nagios XI host definition should match up to where Nagios XI needs to go for an active check (read: the IP address of the monitored host).
In the next version of Nagios XI, there will be more options, with a "Deploy Agent" wizard being developed which may lend itself better to this scenario.
The best routes at this time are as follows:
Option 1: Host Groups
What you could do is add all your hosts with a D drive to a Host Group called "With Drive D" and assign the active NCPA check "Disk Usage on Drive D" to that group (don't assign it individually to each host, only to the host group), then do the same for the rest of the drive letters.
The check would look something like this:
Code: Select all
/usr/local/nagios/libexec/check_ncpa.py -H 127.0.0.1 -t 'some-shared-token-bla-blu-blarg' -P 5693 -M 'disk/logical/D:|' -w '70' -c '90'
Code: Select all
# C:\Program Files (x86)\Nagios\NCPA\etc\ncpa.cfg (Windows)
# or /usr/local/ncpa/etc/ncpa.cfg (Linux)
[api]
community_string =some-shared-token-bla-blu-blarg
The advantage of this is that it centralizes the setup and management. If a host drops their Drive D:, simply remove them from the "With Drive D" group, and they'll no longer receive that check.
Option 2: Passive Checks
Another option is to use passive checks and have Nagios XI auto-setup hosts when it sees them.
One caveat here is if the Nagios XI server ever changes network location, each monitored host's configuration will need to be updated to send the passive check data to a new location.
The advantage here is that it's very easy to add new hosts, with almost no administration needed on the Nagios XI side of things.
Here's an example of how you'd do it. (Edit the config on the monitored Windows host)
Code: Select all
# C:\Program Files (x86)\Nagios\NCPA\etc\ncpa.cfg (Windows)
# or /usr/local/ncpa/etc/ncpa.cfg (Linux)
[nrdp]
parent = http://my-ip-of-nagios-xi-server/nrdp
token = some-shared-token-bla-blu-blarg
hostname = 192.168.x.x # set this to my (Windows) IP address
Code: Select all
# C:\Program Files (x86)\Nagios\NCPA\etc\ncpa.cfg.d\nrdp.cfg (Windows)
# or /usr/local/ncpa/etc/ncpa.cfg.d/nrdp.cfg (Linux)
[passive checks]
%HOSTNAME%|__HOST__ = system/agent_version
%HOSTNAME%|Disk Usage = disk/logical/C:|/used_percent --warning 80 --critical 90 --units Gi
%HOSTNAME%|CPU Usage = cpu/percent --warning 60 --critical 80 --aggregate avg
%HOSTNAME%|Swap Usage = memory/swap --warning 60 --critical 80 --units Gi
%HOSTNAME%|Memory Usage = memory/virtual --warning 80 --critical 90 --units Gi
%HOSTNAME%|Process Count = processes --warning 300 --critical 400
What you could do is use active checks for the common things like CPU Usage and RAM Usage, and Drive C usage. See Option 1 for a basic overview about simplifying the setup of active checks by applying them to entire host groups.
In this case, you'd make ONE host group in this case called "Windows Hosts" or something, add ALL your hosts to them, then add the common active checks CPU Usage and RAM Usage, and Drive C usage to that host group (don't add them to the individual hosts, only to the host group). This necessitates having the community_string (ncpa.cfg) being the same on each of the monitored hosts.
Then, for each of the monitored hosts with a D drive, configure a passive check on the monitored host to send that to Nagios XI.
It would look something like this (edit the config on the monitored Windows host):
Code: Select all
# C:\Program Files (x86)\Nagios\NCPA\etc\ncpa.cfg (Windows)
# or /usr/local/ncpa/etc/nrdp.cfg (Linux)
[nrdp]
parent = http://my-ip-of-nagios-xi-server/nrdp
token = some-shared-token-bla-blu-blarg
hostname = 192.168.x.x # set this to my (Windows) IP address
Code: Select all
# C:\Program Files (x86)\Nagios\NCPA\etc\ncpa.cfg.d\nrdp.cfg (Windows)
# or /usr/local/ncpa/etc/ncpa.cfg.d/nrdp.cfg (Linux)
[passive checks]
%HOSTNAME%|__HOST__ = system/agent_version
%HOSTNAME%|Disk Usage = disk/logical/D:|/used_percent --warning 80 --critical 90 --units Gi
If you use active checks, the hostname both in the passive check (hostname in ncpa.cfg) and the Nagios XI host definition should match up to where Nagios XI needs to go for an active check (read: the IP address of the monitored host).
Last edited by dchurch on Thu Nov 12, 2020 11:14 am, edited 1 time in total.
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.
Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
Re: Bulk Discovery
Thanks @dchurch for the detailed response!
I understand the options.. They all unfortunately need some data gathering , especially option 1.
As you are suggesting using Host Groups, what are the performance implications of putting 1000's of hosts in a single host group?
Thanks
George
I understand the options.. They all unfortunately need some data gathering , especially option 1.
As you are suggesting using Host Groups, what are the performance implications of putting 1000's of hosts in a single host group?
Thanks
George
Re: Bulk Discovery
There really isn't a performance implication of having 1000 hosts in a host group, vs. individually configuring 1000 hosts. Nagios XI will still be sending out and processing 1000's of active checks every N minutes. Host groups are just an abstraction provided by Nagios XI to quickly tell NagiosCore about many hosts with the same configuration.
With passive checks in, there will still be a pretty hefty load on the Nagios XI server since it'll still have to process 1000's of check results every N minutes.
If you want to monitor CPU, RAM, Swap, and 2 drives for each of 1000 hosts, that's 5000 check results for Nagios to process every N minutes.
At least if you go with Option 1 you can dial it back easier, and set the check interval to, say 10 minutes from 5 minutes (default) if you find your XI server is overloading. Dialing it back using Option 2 or Option 3 would involve going in and editing ncpa.cfg on each of the machines to send the passive checks less often.
With passive checks in, there will still be a pretty hefty load on the Nagios XI server since it'll still have to process 1000's of check results every N minutes.
If you want to monitor CPU, RAM, Swap, and 2 drives for each of 1000 hosts, that's 5000 check results for Nagios to process every N minutes.
At least if you go with Option 1 you can dial it back easier, and set the check interval to, say 10 minutes from 5 minutes (default) if you find your XI server is overloading. Dialing it back using Option 2 or Option 3 would involve going in and editing ncpa.cfg on each of the machines to send the passive checks less often.
If you didn't get an 8% raise over the course of the pandemic, you took a pay cut.
Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.
Discussion of wages is protected speech under the National Labor Relations Act, and no employer can tell you you can't disclose your pay with your fellow employees.