Distributed Monitoring with Windows Environments

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
domsch1988
Posts: 32
Joined: Tue Aug 15, 2017 1:20 am

Distributed Monitoring with Windows Environments

Post by domsch1988 »

Hello everyone,

i'm in search for a solution you might be able to help me with. At our Company we are running Nagios XI. Our primary usecase is not monitoring our Internal infrastructure, but that of our customers. So far so good. We are currently monitoring about 650 Hosts and 2500 Services across 90 Customers. That's all working great.
Since we are using mostly NCPA with active checks this also means we are maintaining 90 VPN Tunnels for the checks to work. While this is indeed working nice, it imposes some inconveniences.
First, when the VPN Tunnel goes down or restarts, all Hosts and Services from that customer are reported Down (ofc). Secondly, Maintaining 90 VPN Tunnels on our Firewall isn't exactly Perfect either.

To mitigate those situations i'd ideally want to have several hosts at our customers site that run the checks and report the results over one single public IP to us. Since 99% of client servers are windows, i'd ideally want something that runs on windows (otherwise i'd have to set up Linux VM's for that).

Any tipps on what to look at for getting to main monitoring to our customers sites?
kyang

Re: Distributed Monitoring with Windows Environments

Post by kyang »

To mitigate those situations i'd ideally want to have several hosts at our customers site that run the checks and report the results over one single public IP to us
If you are able to do active checks already, then use passive checks with NCPA using NRDP.

XI has NRDP, just make sure to pass the URL and token to the NCPA.cfg file

Each host (or customer windows pc) can have a specific hostname, but they will need your XI NRDP URL and token.
The handler has to be set to nrdp as well.

Here's our documentation on setting up passive checks. You can create passive checks using the NCPA GUI.

https://assets.nagios.com/downloads/ncp ... Checks.pdf

Let me know if this works for you.
domsch1988
Posts: 32
Joined: Tue Aug 15, 2017 1:20 am

Re: Distributed Monitoring with Windows Environments

Post by domsch1988 »

Yeah, i'm currently trying Passive Checks out. They might be an option. They bring some inconveniences though. Not being able to change warning and critical values from Nagios is a bummer. Also, currently, we use on Service for all windows CPU Checks with all relevant hosts added to that. Going passive will inflate the number of services massively.

None the less it might be an option. On ething i cant figure out:
I'm configuring the checks. All is great but i can't get Spaces to work. The LAN Interface for example is called "Lan-Verbindung 3". I have not found a way to get NCPA to process the space. The same Problem occurs with many Performance counters. Is there a substitution in the windows nrdp.cfg for a space?
kyang

Re: Distributed Monitoring with Windows Environments

Post by kyang »

Yes, passive checks are predefined with the warning and critical levels.
I'm configuring the checks. All is great but i can't get Spaces to work. The LAN Interface for example is called "Lan-Verbindung 3". I have not found a way to get NCPA to process the space. The same Problem occurs with many Performance counters. Is there a substitution in the windows nrdp.cfg for a space?
Could you provide me with the command you are using to check this? Also the list of Performance counters that have the same issue with spacing?

Any examples of what NCPA is actually processing would help also.

Thanks!
domsch1988
Posts: 32
Joined: Tue Aug 15, 2017 1:20 am

Re: Distributed Monitoring with Windows Environments

Post by domsch1988 »

kyang wrote:Yes, passive checks are predefined with the warning and critical levels.

Could you provide me with the command you are using to check this? Also the list of Performance counters that have the same issue with spacing?

Any examples of what NCPA is actually processing would help also.

Thanks!
Sure. The Network Check is defined in NRDP.cfg as follows:

Code: Select all

%HOSTNAME%|Network Usage = interface/LAN-Verbindung 3/bytes_recv
This doesn't process correctly because of the space before the "3". The Log states:

Code: Select all

WARNING:ncpacheck:Unable to parse all arguments from instruction. Mis-paired option: 3/bytes_recv
The same happens with Performance Counters like

Code: Select all

MSExchangeTransport SMTPReceive(_total)\Messages Received/sec
It's always only processed until the first space. I already tried single and double quotes. Also, in URLs its substituted with %20, but that doesn't work here either.
kyang

Re: Distributed Monitoring with Windows Environments

Post by kyang »

This is in your NRDP.cfg? Or do you mean your NCPA.cfg?

Are you grabbing this network check from the NCPA GUI?

This is what my passive check refers to.

Code: Select all

%HOSTNAME%|<service name> = /interface/Local Area Connection/bytes_recv --warning 1000 --critical 2000
Well, actually what version of NCPA are you on?
domsch1988
Posts: 32
Joined: Tue Aug 15, 2017 1:20 am

Re: Distributed Monitoring with Windows Environments

Post by domsch1988 »

kyang wrote:This is in your NRDP.cfg? Or do you mean your NCPA.cfg?

Are you grabbing this network check from the NCPA GUI?

This is what my passive check refers to.

Code: Select all

%HOSTNAME%|<service name> = /interface/Local Area Connection/bytes_recv --warning 1000 --critical 2000
Well, actually what version of NCPA are you on?
Ofc in my nrdp.cfg. That's where Services are defined according to the document here: https://assets.nagios.com/downloads/ncp ... Checks.pdf

No, i'm not getting those from the GUI. The CPU, RAM and HDD Checks where created on NCPA Installation. The Rest i'm doing by hand.
I saw that i missed the initial "/". I added that and that stopped the errors in the passive.log. Nagios now Says
The node (LAN-Verbindung) requested does not exist. You may be trying to access the 'LAN-Verbindung 3' node.
NCPA Version is 2.1.1

Edit: Thinking of it, the difference seems to be, that my Name contains a number and yours doesn't. Maybe NCPA processes Numbers after spaces as Values to the Argument before it (like --warning 1000)?
kyang

Re: Distributed Monitoring with Windows Environments

Post by kyang »

Ah okay, I normally just define checks in my ncpa.cfg since the NRDP connection settings are in there, either way works.
The node (LAN-Verbindung) requested does not exist. You may be trying to access the 'LAN-Verbindung 3' node.
You should try using the GUI. It can be of big help when finding out the correct definitions for passive checks and or active checks.
Thinking of it, the difference seems to be, that my Name contains a number and yours doesn't. Maybe NCPA processes Numbers after spaces as Values to the Argument before it (like --warning 1000)?
I'll have to test this out and see for myself.

Let me know what you find out. The GUI could be of help, as it finds all nodes available for being checked essentially.
domsch1988
Posts: 32
Joined: Tue Aug 15, 2017 1:20 am

Re: Distributed Monitoring with Windows Environments

Post by domsch1988 »

So, i gave the web GUI a try and quite like it. I created the command through the API Tab and it gave me

Code: Select all

%HOSTNAME%|<service name> = /interface/LAN Verbindung/bytes_recv --warning 20 --critical 30 --units Gi
I pasted that into my nrdp.cfg and am getting the same Error

Code: Select all

The node (LAN) requested does not exist. You may be trying to access the 'LAN Verbindung' node.
I also tried it withoutthe "3" in the name and have the space in a different position, but that doesn't change it. I have no clue to be hones :?
kyang

Re: Distributed Monitoring with Windows Environments

Post by kyang »

Yes, the GUI is pretty awesome.

Hmm, but that is interesting... When you added the passive check into the nrdp.cfg. Did you restart ncpa_passive service?

Can you PM or post your nrdp.cfg file for me?

Do you currently have NCPA installed on all of those Windows machines you are trying to get passive checks from?
Locked