Hardware monitoring for multitenant environment

maid3n_55 · Post by **maid3n_55** » Fri Sep 30, 2016 1:27 am

Hi Guys,

I am new to Nagios and I will need some guidance from people who have experience with Nagios. I know, that even if it will be feasible what I want to achieve it will require a lot of time to be spent on reading documentation, testing and so on.

My goals at this moment it will be :

-> Mainly it will be to gather hardware status (overall, disks, power supply, CPU failure, RAM failure, sensors failure, fans and so on. I am not interest to to gather any performance of the systems at the moment)
-> Get rid of tools like HPSIM, IBM Director, OpenManage Essentials, Lenovo Xclarity, HP Open View
-> Get rid of SCOM 2007 R2 and all its management pack for monitoring hardware of servers/storage
-> Have the possibility to configure the tool with a central servers and hubs/gateways in all the remote sites and in different domains.
-> Have the possibility to use the central server ( preferably a cluster of 2 to allow redundancy) as master server from where to deploy all the needed scripts towards hubs/gateways.
-> Gather all the alerts from all hubs/gateways and have a possibility to store them in a DB located on a machine from our central site.

Note: Hardware present in my company till now :

Servers:

- HP Servers from G6 to G9
- IBM Servers System X
- Lenovo x3250 M6
- Dell PowerEdge

Storage:

- HP MSA, HP EVA 4400, HP 3PAR
- IBM DS3250, IBM DS4700, IBM StorWise 3700
- NetApp FAS2220
- EMC Clariion, VNX

SW :

Brocade for FC
HP, Cisco for LAN

From what I've seen on Nagios community this can be achieved with Federated mode.

Considering all the above mentioned can someone help me with some advice related how can I put on track a solution to monitor all my hardware equipment?

Thank you in advance for the provided feedback.

John

rkennedy · Post by **rkennedy** » Fri Sep 30, 2016 1:43 pm

There's a lot of information, so going to break down your post and add notes. If you feel I missed anything, or have more questions, feel free to post them.

-> Mainly it will be to gather hardware status (overall, disks, power supply, CPU failure, RAM failure, sensors failure, fans and so on. I am not interest to to gather any performance of the systems at the moment)

Your best friend is going to be SNMP as that's more than likely how you'll want to monitor all of this.

-> Have the possibility to configure the tool with a central servers and hubs/gateways in all the remote sites and in different domains.
-> Have the possibility to use the central server ( preferably a cluster of 2 to allow redundancy) as master server from where to deploy all the needed scripts towards hubs/gateways.

Are you looking to have mod_gearman, NRPE, or check_by_ssh proxy all of your checks at different sites, and then report back to Nagios? Would need clarification here, but here are a few documents that are relevant.
https://assets.nagios.com/downloads/nag ... ios_XI.pdf
https://support.nagios.com/kb/article.php?id=484

-> Gather all the alerts from all hubs/gateways and have a possibility to store them in a DB located on a machine from our central site.

Metrics will be stored in XI's database, but you should be able to offload your database. I'm not sure how well this would work with the database being at a different site.
https://assets.nagios.com/downloads/nag ... Server.pdf

Servers:

- HP Servers from G6 to G9
- IBM Servers System X
- Lenovo x3250 M6
- Dell PowerEdge

Storage:

- HP MSA, HP EVA 4400, HP 3PAR
- IBM DS3250, IBM DS4700, IBM StorWise 3700
- NetApp FAS2220
- EMC Clariion, VNX

SW :

Brocade for FC
HP, Cisco for LAN

Provided they all have SNMP support, you should be able to gain the metrics you're after. You may be able to find plugins that are specific for these devices, over at https://exchange.nagios.org. Most of the items you've listed click in my head as having a plugin written for them already.

rkennedy · Post by **rkennedy** » Fri Sep 30, 2016 1:45 pm

One more thing to add - if you're after a HA setup take a look at our partner Linbit whom can offer that on top of XI: http://www.linbit.com/en/resources/tech ... centos-6-5

maid3n_55 · Post by **maid3n_55** » Mon Oct 03, 2016 1:39 am

Hi Rkennedy,

Thanks a lot for the provided answers, but I am getting back to this as I want to make sure that you can help me with the right solution.

"Your best friend is going to be SNMP as that's more than likely how you'll want to monitor all of this."

To be honest I want to use SNMP only in those cases where the devices does not have CLI, where there is a CLI I want to use the CLI. Almost all my devices except under 10 machines can be queried for hardware status via CLI.

"Are you looking to have mod_gearman, NRPE, or check_by_ssh proxy all of your checks at different sites, and then report back to Nagios? Would need clarification here, but here are a few documents that are relevant.
https://assets.nagios.com/downloads/nag ... ios_XI.pdf
https://support.nagios.com/kb/article.php?id=484"

From what I've noticed the Gearman mode was mainly designed to offload the checks to "workers" and offer redundancy in case of failure. Can you please let me know if the Gearman mode can be used only to offload the checks? I don't want to have a workload offloaded to another worker.

I trying to clear a little bit more the picture:

1. Central site - let's call it NOC
2. Remote location 1 - different GEO location from NOC and all other locations, different IP ranges, S2SVPN with ONLY with NOC
3. Remote location 2 - different GEO location from NOC and all other locations, different IP ranges, S2SVPN with ONLY with NOC
.
.
N. Remote location n - different GEO location from NOC and all other locations, different IP ranges, S2SVPN with ONLY with NOC

So, what I want to achieve it will be :

- Have a central Nagios machine ( preferably redundant ) - Can it be configured as a Root Management server? from where we can :
* Install all the scripts/customization for all the devices that we want to monitor
* deploy a specific set of the above on a specific Nagios Hub/worker/Gateway in a remote location
* receive alerts from Nagios Hub/worker/Gateway and store them locally in it's own DB ( maybe clustered )?

- Nagios Hub/worker/Gateway ( not redundant)
* Can this machine receive all the configuration specific to it's location from the central machine?
* Can this machine send all the alerts ( basically the output of the CLI commands or SNMP queries that will indicate a hardware failure) to Central machine where those alerts to be stored on a long term?

For example at this moment I have this functionality with SCOM 2007 R2 and a series of management packs ( especially for storage devices) + a series of custom management packs developed "in house" + IBM Director + HPSIM + Dell OpenManage

So, considering all the above can someone let me know if I can achieve the desired functionality with Nagios and if yes which is the mode or the combination of Nagios that should I use?

Thank you in advance for the provided feedback!

John

tmcdonald · Post by **tmcdonald** » Mon Oct 03, 2016 4:51 pm

maid3n_55 wrote:From what I've noticed the Gearman mode was mainly designed to offload the checks to "workers" and offer redundancy in case of failure. Can you please let me know if the Gearman mode can be used only to offload the checks? I don't want to have a workload offloaded to another worker.

Typically yes, mod_gearman is used for spreading the workload and for redundancy, but it is also good in cases where you have firewall rules or other network restrictions in place that would prevent Nagios from checking a remote site directly.

maid3n_55 wrote:So, what I want to achieve it will be :

- Have a central Nagios machine ( preferably redundant ) - Can it be configured as a Root Management server? from where we can :
* Install all the scripts/customization for all the devices that we want to monitor
* deploy a specific set of the above on a specific Nagios Hub/worker/Gateway in a remote location
* receive alerts from Nagios Hub/worker/Gateway and store them locally in it's own DB ( maybe clustered )?

- Nagios Hub/worker/Gateway ( not redundant)
* Can this machine receive all the configuration specific to it's location from the central machine?
* Can this machine send all the alerts ( basically the output of the CLI commands or SNMP queries that will indicate a hardware failure) to Central machine where those alerts to be stored on a long term?

It almost sounds like multiple Nagios XI servers with a central Nagios Fusion is what you want. At the moment Fusion does not do config/agent push to remote XI/Core systems, but it will be undergoing some changes in the future to better tie it into config management. Currently it collects data from several remote systems and displays it all in a single pane of glass, but you still need to configure the remote worker machines.

maid3n_55 · Post by **maid3n_55** » Wed Oct 05, 2016 4:35 am

Hi Guys,

Thanks a lot for sharing ideas with me.

"Typically yes, mod_gearman is used for spreading the workload and for redundancy, but it is also good in cases where you have firewall rules or other network restrictions in place that would prevent Nagios from checking a remote site directly."

In this case will try a Proof Of Concept with this mode.

"It almost sounds like multiple Nagios XI servers with a central Nagios Fusion is what you want. At the moment Fusion does not do config/agent push to remote XI/Core systems, but it will be undergoing some changes in the future to better tie it into config management. Currently it collects data from several remote systems and displays it all in a single pane of glass, but you still need to configure the remote worker machines."

We have already a tool developed "in house" that is connecting to all our monitoring tools and bring the alerts to a single console and we will want to use that console as all our front line people are used with it.

If the POC based on Nagios will be a success then most probably we will extend monitoring to servers and applications, but till then we want to see if we can achieve with Nagios Core as hubs + Nagios XI (as main) the desired setup.

I will share with you guys all my findings, but to be honest I do not have a date when I will start this. First I will need to document a lot about Nagios.

If meantime someone will have another idea and if he can share them it will be great.

Thank you,

John

tmcdonald · Post by **tmcdonald** » Wed Oct 05, 2016 2:13 pm

We can keep this open for community comment, but for the sake of our workflow I will ask that no comment be made unless it is on-topic.

Nagios Support Forum

Hardware monitoring for multitenant environment

Hardware monitoring for multitenant environment

Re: Hardware monitoring for multitenant environment

Re: Hardware monitoring for multitenant environment

Re: Hardware monitoring for multitenant environment

Re: Hardware monitoring for multitenant environment

Re: Hardware monitoring for multitenant environment

Re: Hardware monitoring for multitenant environment