Page 1 of 1

Questions about network monitoring

Posted: Thu Mar 22, 2018 3:31 am
by jszuetta
Hello,

we have a nagios monitoring project of network items with a base set of the following checks:

Code: Select all

SCOPE                    SENSOR
-------------------------------------------
every device             ICMP
every device             CPU
every device             Memory
every device             Fans
every device             PowerSupply
every device             Temperature
every device             Interface
firewalls                State of Failover
We have two issues:

the first (and the most troublesome) is the interface check. The Nagios Wizard only registers the ones which are currently online/has up state at the moment. Our network team's demand is to add down/offline interfaces by default. Is there a workaround about this?

Secondly, our network team demands a more precise cpu metric than we could provide them via a plugin that takes the average of last 5 minutes. To meet their needs the best practice would be lower the check interval to 30 seconds however nagios only takes no less than 1 minute by default, no fractions. Is it safe to lower the check_intervals in the nagios.cfg to 30 seconds (and double all the other check intervals)? Is there any advised values for that? What are the risks/limitations (if there's any)?

Thanks in advance,

Jozsef

Re: Questions about network monitoring

Posted: Thu Mar 22, 2018 3:37 pm
by npolovenko
Hello, @jszuetta. I assume you were talking about the network switch/router wizard.
the first (and the most troublesome) is the interface check. The Nagios Wizard only registers the ones which are currently online/has upstate at the moment. Our network team's demand is to add down/offline interfaces by default. Is there a workaround about this?
I made a modification to the standard wizard. Please navigate to the Admin Menu, then System Extensions, Manage Config Wizards in the left column. In the list of wizards find the "Network Switch / Router wizard" and delete it. Then in the same menu you can upload and install the wizard that I'm attaching to this post.
Secondly, our network team demands a more precise cpu metric than we could provide them via a plugin that takes the average of last 5 minutes. To meet their needs the best practice would be lower the check interval to 30 seconds however nagios only takes no less than 1 minute by default, no fractions. Is it safe to lower the check_intervals in the nagios.cfg to 30 seconds (and double all the other check intervals)? Is there any advised values for that? What are the risks/limitations (if there's any)?
Are you referring to the check_load plugin? It shows an average load of 1, 5 and 15 minutes. Checking the CPU more often might actually increase the load.
I wouldn't set a check interval to 30 seconds. It takes some time and some resources for Nagios to run the check, and imagine if you had 5000 services, how much load would you be putting on the system if you ran 5000 checks every 30 sec?