check_cluster not seeing all hosts in delimited list
Posted: Thu Nov 17, 2016 2:42 pm
I'm sorry for posting this here if this is not the correct location. I could not post in the Enterprise customer support forum.... Maybe it is because my account is new?
We are currently setting up cluster checks as part of our monitoring infrastructure and for some reason, we cannot get the service to display all the hosts/services listed (and delimited by comma). Say we have over 150 hosts listed along with a service. This is great...but then we see maybe 120 hosts. Say we reverse-sort that same list? Then we might see 8 hosts listed in the OK state. All other states stay at zero.
The numbers of missing hosts do not correspond with one another when sorting from either end of the full list, and we have checked and double-triple-quadruple-checked the syntax. But we see no reason why the check is refusing to see all hosts. We have also tried removing just a single random host from the list--just to see what would happen--and suddenly the service could only see 1 single host. This baffles us.
We've not been able to see any rhyme or reason to this strangeness, and we've been able to set up monitoring across our sdlc environments without running into any issues.
Is it something with the code of the check_cluster check itself...?
We are currently setting up cluster checks as part of our monitoring infrastructure and for some reason, we cannot get the service to display all the hosts/services listed (and delimited by comma). Say we have over 150 hosts listed along with a service. This is great...but then we see maybe 120 hosts. Say we reverse-sort that same list? Then we might see 8 hosts listed in the OK state. All other states stay at zero.
The numbers of missing hosts do not correspond with one another when sorting from either end of the full list, and we have checked and double-triple-quadruple-checked the syntax. But we see no reason why the check is refusing to see all hosts. We have also tried removing just a single random host from the list--just to see what would happen--and suddenly the service could only see 1 single host. This baffles us.
We've not been able to see any rhyme or reason to this strangeness, and we've been able to set up monitoring across our sdlc environments without running into any issues.
Is it something with the code of the check_cluster check itself...?