check_rrdtraf with mod_gearman2

bennyboy · Post by **bennyboy** » Wed Aug 30, 2017 3:37 pm

Hi,

I have 2 question in 1 thread.

1) Can you help me to understand how snmp bandwith check work in Nagios XI. I want to use it with mod_gearman and send that check on a specific server.
I see the check definition. $USER1$/check_rrdtraf -f /var/lib/mrtg/$ARG1$ -w $ARG2$ -c $ARG3$ -l $ARG4$ I don't understand what generate the rrd file in /var/lib/mrtg/
The only way to use that type of monitoring for the moment is to run the check on Nagios XI principal server. I already prepare a server to run those check and setup all the stuff on that server but I don't know how to generate the rrd file like Nagios XI do it by the wizard.

2) I see mod_gearman official repo have couple update and I want to know if you plan to integrate those update in the package you support.

Thank you!

Post by **tacolover101** » Wed Aug 30, 2017 4:30 pm

this is one check that cannot run on another server. there are MRTG configs created, that coincide with creating the RRD files. this has been answered before over here - https://support.nagios.com/forum/viewto ... 10#p131432

i think the Nagios wizards are working on a new concept that would be similar to mod_gearman, and it would be awesome to see the ability to have a 'network poller' option, which 'syncs' up MRTG configs to assigned pollers. (could rsync MRTG configs, but dynamic RRD data between servers would get messy unless they had a shared NFS mount point with high IO. one way to tackle this could be push the RRD data over REST to absorb in the XI interface.)

scottwilkerson · Post by **scottwilkerson** » Wed Aug 30, 2017 4:45 pm

tacolover101 wrote:this is one check that cannot run on another server. there are MRTG configs created, that coincide with creating the RRD files. this has been answered before over here - https://support.nagios.com/forum/viewto ... 10#p131432

i think the Nagios wizards are working on a new concept that would be similar to mod_gearman, and it would be awesome to see the ability to have a 'network poller' option, which 'syncs' up MRTG configs to assigned pollers. (could rsync MRTG configs, but dynamic RRD data between servers would get messy unless they had a shared NFS mount point with high IO. one way to tackle this could be push the RRD data over REST to absorb in the XI interface.)

The problem is it is REALLY inefficient to pass the check to a remote worker to have it poll the XI server. You would be adding a ton of work, it's much more efficient to do it on the server that is storing the RRD file.

Post by **tacolover101** » Wed Aug 30, 2017 5:08 pm

scottwilkerson wrote:
tacolover101 wrote:this is one check that cannot run on another server. there are MRTG configs created, that coincide with creating the RRD files. this has been answered before over here - https://support.nagios.com/forum/viewto ... 10#p131432

i think the Nagios wizards are working on a new concept that would be similar to mod_gearman, and it would be awesome to see the ability to have a 'network poller' option, which 'syncs' up MRTG configs to assigned pollers. (could rsync MRTG configs, but dynamic RRD data between servers would get messy unless they had a shared NFS mount point with high IO. one way to tackle this could be push the RRD data over REST to absorb in the XI interface.)
The problem is it is REALLY inefficient to pass the check to a remote worker to have it poll the XI server. You would be adding a ton of work, it's much more efficient to do it on the server that is storing the RRD file.

i don't disagree with you at all. it adds a lot of unneeded overhead. i propose a few thoughts:
1. allow full network / RRD offloading to a specific 'MRTG' gearman which holds it's own set of the data, and reports back results as the normal process would follow.
2. if workers are on the same network, allow a shared mount to be used for updating the RRD files together. i don't believe more than one RRD file would be written to at the same time.
3. allow workers to report data back to XI over REST so all XI has to do is absorb it and write to the RRD's. probably not the most effective.

this is all just rough shenanigans - it would all take time to write all this out, and the question then becomes - is it worth it for this single command?

scottwilkerson · Post by **scottwilkerson** » Thu Aug 31, 2017 8:46 am

tacolover101 wrote:i don't disagree with you at all. it adds a lot of unneeded overhead. i propose a few thoughts:
1. allow full network / RRD offloading to a specific 'MRTG' gearman which holds it's own set of the data, and reports back results as the normal process would follow.
2. if workers are on the same network, allow a shared mount to be used for updating the RRD files together. i don't believe more than one RRD file would be written to at the same time.
3. allow workers to report data back to XI over REST so all XI has to do is absorb it and write to the RRD's. probably not the most effective.

this is all just rough shenanigans - it would all take time to write all this out, and the question then becomes - is it worth it for this single command?

I don't want to continue to derail @bennyboy question. but I'm going to say we've tested this and it isn't worth it.

running the rrd commands across shared drive are really slow, rsyncing all the files to all the worked is not only inefficient, it isn't accurate because they aren't always up to date.

bennyboy · Post by **bennyboy** » Thu Aug 31, 2017 9:27 am

We already have +- 1500 host and +- 15000 service check at the moment and plan to multiply by 3 that number. Telco want to add switch, router, firewall, netscaler, WLC, Access Point. Those equipement represent +- 4000 device and a lot of services. We are using mod_gearman to distribute the load. I think it's nescessary to use that type of setup. Can you help me to choose the best solution to handle that number of check.

Thank you!

tmcdonald · Post by **tmcdonald** » Thu Aug 31, 2017 3:23 pm

The most we usually recommend a single XI server to handle is about 30,000 total checks, running on a 5-minute interval, and with the following performance tweaks:

There are of course many variables to consider (type of check, how many hosts/services are down, whether event handlers are running, etc.) but that is a rough suggestion. Beyond 30,000 checks, you start to run into several issues:

Config management - 30,000 checks is a lot to keep track of even in a well-documented setup. It is far easier to separate checks based on geographical location, owner, importance, etc.
Reports / interface - When you try to run a report on 30,000 objects it can take quite a while, and the same applies to loading many results on a web page. This effect is magnified when you have many users logged in at once.
Single point of failure - If you have only a single XI machine and it goes down, loses network access, etc. then all your checks are being missed. Splitting checks between two or more servers minimizes this risk

20,000 is generally a safe number, 30,000 is reaching the limits of what we recommend. Any more than that and you end up spending more time tuning the performance of your XI server than you do actually configuring checks or responding to alerts.

Nagios Fusion can be used to tie all your XI (and Core) systems together into a single web interface.

Post by **tacolover101** » Thu Aug 31, 2017 7:22 pm

as of now, it doesn't sound like there is a clean way to do this. i do agree that all of the methods are pretty wasteful in a resource sense as the Nagios team mentioned.

here's a simple script that you could modify, to leverage check_by_ssh. modify it to your needs, and test - by no means is it guaranteed to work or be a solution.

Code: Select all

#!/bin/bash
ip=$(ifconfig | grep 'inet ' | grep 'broadcast' | cut -d ' ' -f 2)
if [ "$ip" == "ip.of.nagios" ]; then
#echo "i am nagios, run normal check"
/usr/local/nagios/libexec/check_rrdtraf -f /var/lib/mrtg/$1 -w $2 -c $3 -l $4
else
#echo "i am worker, use ssh to exec"
/usr/local/nagios/libexec/check_by_ssh -H ip.of.nagios -C "/usr/local/nagios/libexec/check_rrdtraf -f /var/lib/mrtg/$1 -w $2 -c $3 -l $4"
fi

this is very redundant, and could cause instability as it could throw the scheduler internal to Nagios off. (since technically the check is then back hauled back to Nagios)

for a large setup, Fusion and a segregated Nagios is a thought to consider.

Post by **tgriep** » Fri Sep 01, 2017 1:32 pm

Thanks @tacolover101 for the help.
Another option is to use a different Bandwidth plugin that does not use the RRD files to can be run remotely.
Take a look at the Exchange site for a stand alone plugin that you could use.
https://exchange.nagios.org/

Nagios Support Forum

check_rrdtraf with mod_gearman2

check_rrdtraf with mod_gearman2

Re: check_rrdtraf with mod_gearman2

Re: check_rrdtraf with mod_gearman2

Re: check_rrdtraf with mod_gearman2

Re: check_rrdtraf with mod_gearman2

Re: check_rrdtraf with mod_gearman2

Re: check_rrdtraf with mod_gearman2

Re: check_rrdtraf with mod_gearman2

Re: check_rrdtraf with mod_gearman2