Page 1 of 1
check_rrdtraf with mod_gearman2
Posted: Wed Aug 30, 2017 3:37 pm
by bennyboy
Hi,
I have 2 question in 1 thread.
1) Can you help me to understand how snmp bandwith check work in Nagios XI. I want to use it with mod_gearman and send that check on a specific server.
I see the check definition. $USER1$/check_rrdtraf -f /var/lib/mrtg/$ARG1$ -w $ARG2$ -c $ARG3$ -l $ARG4$ I don't understand what generate the rrd file in /var/lib/mrtg/
The only way to use that type of monitoring for the moment is to run the check on Nagios XI principal server. I already prepare a server to run those check and setup all the stuff on that server but I don't know how to generate the rrd file like Nagios XI do it by the wizard.
2) I see mod_gearman official repo have couple update and I want to know if you plan to integrate those update in the package you support.
Thank you!
Re: check_rrdtraf with mod_gearman2
Posted: Wed Aug 30, 2017 4:30 pm
by tacolover101
this is one check that cannot run on another server. there are MRTG configs created, that coincide with creating the RRD files. this has been answered before over here -
https://support.nagios.com/forum/viewto ... 10#p131432
i think the Nagios wizards are working on a new concept that would be similar to mod_gearman, and it would be awesome to see the ability to have a 'network poller' option, which 'syncs' up MRTG configs to assigned pollers. (could rsync MRTG configs, but dynamic RRD data between servers would get messy unless they had a shared NFS mount point with high IO. one way to tackle this could be push the RRD data over REST to absorb in the XI interface.)
Re: check_rrdtraf with mod_gearman2
Posted: Wed Aug 30, 2017 4:45 pm
by scottwilkerson
tacolover101 wrote:this is one check that cannot run on another server. there are MRTG configs created, that coincide with creating the RRD files. this has been answered before over here -
https://support.nagios.com/forum/viewto ... 10#p131432
i think the Nagios wizards are working on a new concept that would be similar to mod_gearman, and it would be awesome to see the ability to have a 'network poller' option, which 'syncs' up MRTG configs to assigned pollers. (could rsync MRTG configs, but dynamic RRD data between servers would get messy unless they had a shared NFS mount point with high IO. one way to tackle this could be push the RRD data over REST to absorb in the XI interface.)
The problem is it is REALLY inefficient to pass the check to a remote worker to have it poll the XI server. You would be adding a ton of work, it's much more efficient to do it on the server that is storing the RRD file.
Re: check_rrdtraf with mod_gearman2
Posted: Wed Aug 30, 2017 5:08 pm
by tacolover101
scottwilkerson wrote:tacolover101 wrote:this is one check that cannot run on another server. there are MRTG configs created, that coincide with creating the RRD files. this has been answered before over here -
https://support.nagios.com/forum/viewto ... 10#p131432
i think the Nagios wizards are working on a new concept that would be similar to mod_gearman, and it would be awesome to see the ability to have a 'network poller' option, which 'syncs' up MRTG configs to assigned pollers. (could rsync MRTG configs, but dynamic RRD data between servers would get messy unless they had a shared NFS mount point with high IO. one way to tackle this could be push the RRD data over REST to absorb in the XI interface.)
The problem is it is REALLY inefficient to pass the check to a remote worker to have it poll the XI server. You would be adding a ton of work, it's much more efficient to do it on the server that is storing the RRD file.
i don't disagree with you at all. it adds a lot of unneeded overhead. i propose a few thoughts:
1. allow full network / RRD offloading to a specific 'MRTG' gearman which holds it's own set of the data, and reports back results as the normal process would follow.
2. if workers are on the same network, allow a shared mount to be used for updating the RRD files together. i don't believe more than one RRD file would be written to at the same time.
3. allow workers to report data back to XI over REST so all XI has to do is absorb it and write to the RRD's. probably not the most effective.
this is all just rough shenanigans - it would all take time to write all this out, and the question then becomes - is it worth it for this single command?
Re: check_rrdtraf with mod_gearman2
Posted: Thu Aug 31, 2017 8:46 am
by scottwilkerson
tacolover101 wrote:i don't disagree with you at all. it adds a lot of unneeded overhead. i propose a few thoughts:
1. allow full network / RRD offloading to a specific 'MRTG' gearman which holds it's own set of the data, and reports back results as the normal process would follow.
2. if workers are on the same network, allow a shared mount to be used for updating the RRD files together. i don't believe more than one RRD file would be written to at the same time.
3. allow workers to report data back to XI over REST so all XI has to do is absorb it and write to the RRD's. probably not the most effective.
this is all just rough shenanigans - it would all take time to write all this out, and the question then becomes - is it worth it for this single command?
I don't want to continue to derail @bennyboy question. but I'm going to say we've tested this and it isn't worth it.
running the rrd commands across shared drive are really slow, rsyncing all the files to all the worked is not only inefficient, it isn't accurate because they aren't always up to date.
Re: check_rrdtraf with mod_gearman2
Posted: Thu Aug 31, 2017 9:27 am
by bennyboy
We already have +- 1500 host and +- 15000 service check at the moment and plan to multiply by 3 that number. Telco want to add switch, router, firewall, netscaler, WLC, Access Point. Those equipement represent +- 4000 device and a lot of services. We are using mod_gearman to distribute the load. I think it's nescessary to use that type of setup. Can you help me to choose the best solution to handle that number of check.
Thank you!
Re: check_rrdtraf with mod_gearman2
Posted: Thu Aug 31, 2017 3:23 pm
by tmcdonald
The most we usually recommend a single XI server to handle is about 30,000 total checks, running on a 5-minute interval, and with the following performance tweaks:
There are of course many variables to consider (type of check, how many hosts/services are down, whether event handlers are running, etc.) but that is a rough suggestion. Beyond 30,000 checks, you start to run into several issues:
- Config management - 30,000 checks is a lot to keep track of even in a well-documented setup. It is far easier to separate checks based on geographical location, owner, importance, etc.
- Reports / interface - When you try to run a report on 30,000 objects it can take quite a while, and the same applies to loading many results on a web page. This effect is magnified when you have many users logged in at once.
- Single point of failure - If you have only a single XI machine and it goes down, loses network access, etc. then all your checks are being missed. Splitting checks between two or more servers minimizes this risk
20,000 is generally a safe number, 30,000 is reaching the limits of what we recommend. Any more than that and you end up spending more time tuning the performance of your XI server than you do actually configuring checks or responding to alerts.
Nagios Fusion can be used to tie all your XI (and Core) systems together into a single web interface.
Re: check_rrdtraf with mod_gearman2
Posted: Thu Aug 31, 2017 7:22 pm
by tacolover101
as of now, it doesn't sound like there is a clean way to do this. i do agree that all of the methods are pretty wasteful in a resource sense as the Nagios team mentioned.
here's a simple script that you could modify, to leverage check_by_ssh. modify it to your needs, and test - by no means is it guaranteed to work or be a solution.
Code: Select all
#!/bin/bash
ip=$(ifconfig | grep 'inet ' | grep 'broadcast' | cut -d ' ' -f 2)
if [ "$ip" == "ip.of.nagios" ]; then
#echo "i am nagios, run normal check"
/usr/local/nagios/libexec/check_rrdtraf -f /var/lib/mrtg/$1 -w $2 -c $3 -l $4
else
#echo "i am worker, use ssh to exec"
/usr/local/nagios/libexec/check_by_ssh -H ip.of.nagios -C "/usr/local/nagios/libexec/check_rrdtraf -f /var/lib/mrtg/$1 -w $2 -c $3 -l $4"
fi
this is very redundant, and could cause instability as it could throw the scheduler internal to Nagios off. (since technically the check is then back hauled back to Nagios)
for a large setup, Fusion and a segregated Nagios is a thought to consider.
Re: check_rrdtraf with mod_gearman2
Posted: Fri Sep 01, 2017 1:32 pm
by tgriep
Thanks
@tacolover101 for the help.
Another option is to use a different Bandwidth plugin that does not use the RRD files to can be run remotely.
Take a look at the Exchange site for a stand alone plugin that you could use.
https://exchange.nagios.org/