Multiple IP Address per Host Strategies

wltjr · Post by **wltjr** » Mon Jul 13, 2015 4:27 pm

This is likely something that is asked allot, and I have been researching this myself.

Scenario
Each host has 4 IP addresses, at minimum. Public and private IPv4 and IPv6 addresses.

I am aware of the following approaches, and there likely are more.

1. Host per IP address (what I am using now and not really ideal)
2. Use plugin like check_multiaddr or create my own
3. Use a check_dummy, or single IP main check for host, and have all other IPs checked as services
4. Custom host macros for the other IP addresses.

There are pros and cons to each approach. In using #1, what I do not like is that it seems like I have allot more hosts than I really do have. It also is not really telling to if a host is really up/down, or just issues with a connection. What I do like, is for services you can be specific about what you are checking. Some services are only available locally via the LAN/VPN. Having the hosts per IP makes that straight forward for checking services on each. That gets more complex with options #2 and #4. I have to modify the commands to be aware of the multiple address via plugin or macros. I think #3 might be the best option, but it has its own drawbacks. With #3 in theory I could have all IPs checked as services, and then somehow reference that, or have those checks be part of the host state. If they all fail, 4 in this case, host is down, but otherwise, warning or something. I would like to see less hosts, and be aware of issues with IP address on those hosts in a sense. Now the drawback with #3, while I can make other services dependent on say the service ping check. I am not sure how I would pass those dependent services the right IP address, short of using some macro. I guess I could just set like a _ADDRESS in service which becomes SERVICEADDRESS, or something like that, and have other commands use that. If you can override the HOSTADDRESS macro within a service or something, and have any dependent services use that. Then that would work and have other uses, but I am not sure that is possible.

It seems others would have run into this same issue. I guess most are just going with #1, which works, but I do not like having 1 single host show up as 4, basically means I have 4 times the amount of hosts a I actually do have. I am curious what strategies others have taken with this same scenario.

jdalrymple · Post by **jdalrymple** » Mon Jul 13, 2015 4:46 pm

Is the goal to monitor whether layer 3 is up? You're right, an IP address shouldn't correlate with a host, and doesn't.

Each device is going to be different depending on use case. With say a Cisco L3, (which might have hundreds of IPs) I'd just use SNMP ifOperStatus. If say a DMZ host that is doing weird innie outtie stuff I might do a double hop check_ping to make sure that each interface can get to an important place. Say the public side needs to get to 8.8.8.8 and the private side needs to get to it's secure SQL server - whatever the case may be.

Point is, while you never directly said it, it sounds to me like you need to make sure that layer 3 is up in multiple places on some boxes. Layer 3 is a service, not a host and should be treated as such, it's up to circumstance to determine how you measure the "upness" of that service.

Did I miss the point entirely?

** EDIT **

just as a point of opinion - $HOSTADDR$ should be the most likely one you'll reach from the Nagios process to get valid results. That seems like a no-brainer to me.

wltjr · Post by **wltjr** » Mon Jul 13, 2015 5:00 pm

Sorry let me provide more information, the host is a cloud server, with a remote Nagios server. The main way to check to see if that is up is likely going to be via Layer 3. Which if I can't ping, I can't really get any info via SNMP either. I could have checks being done locally, then calling home to the nagios server to report back their statuses, etc. That I guess would be another approach. Though really if I can't reach the host on any of 4 IPs, in my scenario that host is down. Even if it really is up, its useless to me, so same as being down, thus would want that state reflected. That includes if its up and was able to report all is well back to Nagios, but nagios still could not reach it for some reason or another. Not likely but could happen in some odd scenario.

Going with checking IPs as services, which is part of the #3 approach. I still have to come up with some way to set the host status. Ideally I would like those 4 IP/Layer 3 based service checks to be checked as part of the host status. But I do not think a host status can be dependent on or come directly from service checks on that same host. Not to mention pass that IP from that service check, on to any dependent service checks, as it would not be using the HOSTADDRESS but SERVICEADDRESS if I set that macro, service { _address }. So any services dependent on a ping check to a private lan IP, would use the IP of the parent/depended service, not that of the host.

jdalrymple · Post by **jdalrymple** » Mon Jul 13, 2015 5:11 pm

I'm still not sure I'm catching the logic. Sounds like you want 4 IPs ANDed together to create host state? I can make sense of the fact that if all 4 IPs aren't up then the host is useless to you. I can also tell you though that if you have 1 IP that you use to monitor the host and 4 services, one for each ifOperStatus that you'll get better usability out of Nagios:

CRITICAL - INTERFACE 3 on HOST WHATEVER IS DOWN

is far more useful and quick to recover from IMO than:

CRITICAL - SOMETHING IS WRONG WITH HOST WHATEVER

And in my Inbox, CRITICAL is CRITICAL. It doesn't matter if it's a service or a host, the E-mail reads just as clearly and loudly.

wltjr wrote:I still have to come up with some way to set the host status.

And again I'd argue that whatever the absolute most reliable way to identify whether that host is powered on or not is going to be ideal. In most environments, it's the LANniest IP, but sometimes that isn't the case. Sometimes ICMP doesn't work but HTTPS does. It's all circumstantial. Sounds like your circumstances are not normal, but in order to be able to interpret how Nagios is best able to mold to your specific circumstances we need to know more about why they're not normal. To this point all I can glean is enough information to suggest the generalizations:

1) Your host check command should verify if the host is up or not
2) You should specify services that are granular enough to help you quickly resolve problems but not so granular as to induce false postives

Good luck

wltjr · Post by **wltjr** » Mon Jul 13, 2015 5:41 pm

jdalrymple wrote:I'm still not sure I'm catching the logic. Sounds like you want 4 IPs ANDed together to create host state?

If I am going with all IP checks being services, then yes I would like to combine those 4 checks to determine the host state.

jdalrymple wrote:I can make sense of the fact that if all 4 IPs aren't up then the host is useless to you. I can also tell you though that if you have 1 IP that you use to monitor the host and 4 services, one for each ifOperStatus that you'll get better usability out of Nagios:

The problem with using a single IP for the host, is that any dependent services use that address. I still have the issue of passing IPs in any service checks to dependent service checks. I am doing different checks on the different IPs. Thus having each as different hosts in #1, any service check is using the correct IP. But I end up with way more hosts, 4 times the amount.

jdalrymple wrote:And in my Inbox, CRITICAL is CRITICAL. It doesn't matter if it's a service or a host, the E-mail reads just as clearly and loudly.

Which I have now with each host having an IP, so 4 hosts per host. Also in web interface rather not have a long list of hosts, the count of hosts, etc. its all misleading.

jdalrymple wrote:And again I'd argue that whatever the absolute most reliable way to identify whether that host is powered on or not is going to be ideal.

There is no reliability of anything in clouds, not public ones. It is one of the drawbacks. Any reliability is your own making. You do not really power on a cloud virtual server. Even if the server is up and running, it might not be reachable, etc. There might be an issue with IPv4 that does not effect IPv6 or vice versa.

jdalrymple wrote:In most environments, it's the LANniest IP, but sometimes that isn't the case.

The LAN depends on the VPN, which depends on public access. Which in one sense would say use public addresses. Ok, but I still have 2. You could say IPv4 might be the most pressing, but clients could be accessing via IPv6. I rather not say which I care about more, I care about them both really.

jdalrymple wrote:Sometimes ICMP doesn't work but HTTPS does. It's all circumstantial. Sounds like your circumstances are not normal, but in order to be able to interpret how Nagios is best able to mold to your specific circumstances we need to know more about why they're not normal. To this point all I can glean is enough information to suggest the generalizations:

Things happen in cloud envs that are not normal, its part of the nature of clouds. You should expect the unexpected, and plan accordingly. I have things working now, I just do not like how things are laid out. I do not like having 4 hosts per every 1 host. Just so I can pass on the right IP to dependent services. Which doesn't really give me an over all picture of the server state, or what might need to be addressed.

jdalrymple · Post by **jdalrymple** » Tue Jul 14, 2015 7:17 am

wltjr wrote:There is no reliability of anything in clouds, not public ones. It is one of the drawbacks. Any reliability is your own making. You do not really power on a cloud virtual server. Even if the server is up and running, it might not be reachable, etc. There might be an issue with IPv4 that does not effect IPv6 or vice versa.

Amazon cloudwatch? Azure perfcounters? I'm not sure who your provider is, but they should provide an api based way of monitoring your hosts up/downness. And if their API goes down you should find another provider.

Incidentally, with a little bit of digging you'd find that the board you're typing this on is cloud hosted. Seeing a comparison of api based up/downness vs ceck_ping over a long period of time would be an interesting metric. For ours im fairly cetain they'd be within the 5 9s margin of each other. If your public cloud is unreliable you might think to shop around.

Back to the topic though, as I reread through this all I can see is that you are fixed on ping&&ping&&ping&&ping. Ive shared the typical and recommended approach and also explained why, but in your case the ping^4 approach is better suited to your needs. What other advice are you seeking from the board, or is it OK to call this topic a wrap?

wltjr · Post by **wltjr** » Tue Jul 14, 2015 11:01 am

jdalrymple wrote: Incidentally, with a little bit of digging you'd find that the board you're typing this on is cloud hosted.

Yes, support.nagios.com is hosted at Linode, where I have most of my cloud servers hosted but not all. Also I would like a consistent way to check hosts, regardless of the cloud provider. Hooking into their API, if they provide such, is more specific to each cloud provider. Then again just because they say the host is up, not sure what that means to me per se. The host might be up not but usable, so to me its down in a sense. However after looking at the Linode API, I am not seeing anything status related. I likely am missing something, but I did look around a bit.
https://www.linode.com/api

jdalrymple wrote: Seeing a comparison of api based up/downness vs ceck_ping over a long period of time would be an interesting metric. For ours im fairly cetain they'd be within the 5 9s margin of each other. If your public cloud is unreliable you might think to shop around.

Well just last night I got alerts from a DNS server in Fremont, CA that was having some issues. My monitoring server is in Dallas at Rackspace, and the rest of my nodes are in Atlanta. Interestingly enough ping worked, and some service checks worked, but others had socket time outs, 1 spanned 10 minutes for VPN ipv6 ULA, and the other 5 minutes for Public IPv6. Some what shows the oddities I was speaking of. I had other issues with Linodes internal DNS servers as did others, but I run my own, which I had some funkiness with last night only effecting IPv6.

jdalrymple wrote: Back to the topic though, as I reread through this all I can see is that you are fixed on ping&&ping&&ping&&ping. Ive shared the typical and recommended approach and also explained why, but in your case the ping^4 approach is better suited to your needs. What other advice are you seeking from the board, or is it OK to call this topic a wrap?

I was looking for others strategies to multiple ip address per host, maybe its not the best place to seek such, but I really did not get much of that. The replies are more focused on how I am determining a host is up or down. Not addressing the multiple IP address issue at all, nor passing that to dependent services, without doing it as I am now, host per IP.

I am not that keen on a ping check, as for public IPv4 I block that entirely. I am not doing that on IPv6 at the moment. Then again I think I always relied on ping checks in my old private cloud where it was all taking place over a physical LAN. The default check-host-alive command is ping based.

I am experimenting with check_multi, and I have a single host now that has 2 IP address, 1 via macro, and the check multi allows me to ping both, and if either are down, it reflects the host as down. I can still tweak and dial that in. Not sure if I will continue on with that approach and have services use the different addresses, default and the macro's I am adding. Still have to see about service commands using the new macros so they are checking against the right IP for the host.

I do feel I need to reduce the amount of hosts, as its making the map crazy, with each server showing up 4 times, once for each IP. I guess I might have to keep on with the macro and check_multi approach or something.

wltjr · Post by **wltjr** » Tue Jul 14, 2015 12:22 pm

wltjr wrote: However after looking at the Linode API, I am not seeing anything status related. I likely am missing something, but I did look around a bit.
https://www.linode.com/api

It does have the ability to show node status via linode.list() operation. I made a quick crude bash script for this. But I still have the issue of multiple IP addresses to contend with. Though I guess this way I can make the IP checks as services, and not sure how I will pass on the right IP to the various service checks. I might stick with host macros and use them in the services, rather than service macros.

check_linode.sh

Code: Select all

#!/bin/bash

API_KEY=replace_with_your_api_key
EXIT_CODE=0;

[[ -z ${1} ]] && echo "Linode ID not specified aborting" && exit 3

STATUS=$(curl -s "https://api.linode.com/?api_key=${API_KEY}&api_action=linode.list&linodeid=${1}" \
        | sed 's/.*"STATUS":\([0-9]*\),.*/\1/')

case ${STATUS} in
        -1) echo "WARNING - Linode Being Created"
                EXIT_CODE=1
                ;;
        0) echo "WARNING - Brand New Linode"
                EXIT_CODE=1
                ;;
        1) echo "OK - Linode Running"
                EXIT_CODE=0
                ;;
        2) echo "CRITICAL - Linode Powered Off"
                EXIT_CODE=2
                ;;
        *) echo "Unknown"
                EXIT_CODE=3
                ;;
esac

exit ${EXIT_CODE}

jdalrymple · Post by **jdalrymple** » Tue Jul 14, 2015 4:21 pm

wltjr wrote:I am not that keen on a ping check, as for public IPv4 I block that entirely.

I don't know of an alternative. To me https, ssh, 500/4500, those should be services. If you have no better way to do host up/down, those would be it I guess, otherwise you're stuck with check_dummy. I guess what I'm saying is that besides API into your cloud I don't know of anything to help you besides echoreq - but maybe someone else has some magic. Passive of course... but I guess that wasn't in your original 4 options so I don't know where it fits in for you.

I will still firmly stand that IP interfaces being up (especially if you have more than one) should be a service if there is no echoreq available.

wltjr wrote:I do feel I need to reduce the amount of hosts, as its making the map crazy, with each server showing up 4 times, once for each IP.

I do feel your pain - but I have nothing better to offer than what's here. Like I said - think of it in terms of a N7k with 2000 IP interfaces - you're not going to do 2000 hosts, but you will do 2000 ifOperStatus's - because they DO matter.

wltjr · Post by **wltjr** » Wed Jul 15, 2015 12:55 pm

For some reason the focus is on the host checks but miss the bigger picture. The topic is not host checks, but dealing with multiple IPs per host and strategies there. I already implemented the Linode API check_linode for the host status. Though there could be issues there that might make it seem as if the host is not running, like inability to reach api, etc. Anyway host check is no longer bound to a "service". Though it is technically talking to a service to get host state.

That said the main issue was and still is having MULTIPLE IP addresses per host and passing those on. I have gone with #4 and implemented custom host macros, _address_ipv4, _address_lan, and _address_ula. The regular hostaddress gets an IPv6 address, the _lan one gets a private IPv4, and the other two are self explanatory. I then went and made different check commands for each, _ipv4, _lan, _ula, to work with the new macros.

The result is basically what I wanted, single host with multiple IPs. All ping, http, etc checks are done as service checks now. Though that is some what moot. The real issue is having multiple IPs per host and working with those. As for the host status coming from the cloud API now. I do not really see how that is any better than a ping check or other. After all the API could say the vm is up and running, yet its totally unreachable. In that case all the service checks will continue, the host will be shown as up, when really it is down...

IMHO the host check should be something that reflects the true state of the host in terms of usability. Part of the whole point to monitoring is making sure services are available. Who cares if a host is up if non of its services are available. In my opinion that host is down, even if its powered on and running. From a usability stand point, it would not be usable, and need to be taken out of active inventory, till its fixed and services are available again.

What if the host is just a virtual host? Like just a virtual website, one of several, on a server. In that case the only checks would be ping and like http. Is the IP reachable, are web pages accessible, if both yes, then the virtual host, NOT virtual machine is up. Also I am monitoring a remote 3rd party server for my LUG. Which the provider, Chicago VPS does not have an API. Pings are filtered for IPv4, and no option for IPv6. Thus the ONLY host check I can do on that one is http, and it has no services. In fact some hosts like a virtual host will never have services. After all its just a website, that might at best have a dedicated IP address, but its running on a shared server/host, etc.

I guess you could say make those service checks on the host they are running on. That is one way to go. But then again for others without a VM, inability to ping, etc. You are limited, so in that case it makes less sense to have a dummy check on the host, and move all real checks to services.

Anyway I addressed my problem, though still no clue how others are addressing this same issue, and I know others are running into this. I have seen check_multiaddr, and check_multi commands, different but similar just the same. I think systems with more than on IP are becoming increasingly common, and with IPv6 a system could have several, IPv6 ULA, IPv6 Global, IPv6 Global Temporary, and thats not even including IPv4 if in use, could be 2 more, public and private. Assuming a host only has 1 IP/address is pretty limiting. Then again macros let you do what ever. Just seems bit hackish and not so elegant, but works.

Nagios Support Forum

Multiple IP Address per Host Strategies

Multiple IP Address per Host Strategies

Re: Multiple IP Address per Host Strategies

Re: Multiple IP Address per Host Strategies

Re: Multiple IP Address per Host Strategies

Re: Multiple IP Address per Host Strategies

Re: Multiple IP Address per Host Strategies

Re: Multiple IP Address per Host Strategies

Re: Multiple IP Address per Host Strategies

Re: Multiple IP Address per Host Strategies

Re: Multiple IP Address per Host Strategies