Page 1 of 1

Monitor sub-groups of computers and not create host groups?

Posted: Sat Aug 11, 2012 12:30 pm
by jbruyet
Hey all, is there a way to monitor a sub-group of a group of computers without that sub-group being put into a Host Group of its own? Here’s one example of what I’m talking about: I have 60 computers in Host Group Workstations for monitoring basic stuff like CPU, Memory, etc. Some of these computers have an imaging utility installed on them to facilitate disaster recovery. I’d like to monitor these computers to make sure the utility is running but I don’t want that to create another Host Group. I have other Processes and Services to monitor in more sub-groups and I don’t want each of these sub-groups in their own Host Group—I’d like everything in the Workstations Host Group. Yeah, I could do each computer individually but that’s a little more labor-intensive than what I want. Any and all ideas, suggestions and recommendations gladly accepted.

Thanks,

Joe B

Re: Monitor sub-groups of computers and not create host grou

Posted: Sun Aug 12, 2012 6:44 pm
by jsmurphy
You aren't going to like this but... More Hostgroups! There isn't really an alternative and I'm not really sure why it's an issue.

You can assign multiple hostgroups to a single host (i.e. my exchange server is an exchange server and a windows server and a mail server so it has three hostgroups), this way you aren't doubling up on the config and having to main two sets of identical checks. If you want to get real fancy you can use hostgroup nesting, that way if you were to assign a device as an exchange server it will be nested under windows server and flow on all those checks without having to assign it to two hostgroups.

Re: Monitor sub-groups of computers and not create host grou

Posted: Tue Aug 14, 2012 5:02 pm
by jbruyet
Sigh, no I'm not going to like it. I guess the only reason it's an issue is I was hoping to consolidate things down to a single screen showing everything I want to monitor. I guess I'll get to work on the Hostgroups.

Thanks,

Joe B

Re: Monitor sub-groups of computers and not create host grou

Posted: Tue Aug 14, 2012 7:37 pm
by jsmurphy
jbruyet wrote:I guess the only reason it's an issue is I was hoping to consolidate things down to a single screen showing everything I want to monitor.
Hmmmmm, why would multiple host groups be preventing that? I'm assuming when you say "everything you want to monitor"... that's everything regardless of if it is up or down? I'm curious now.

Re: Monitor sub-groups of computers and not create host grou

Posted: Wed Aug 15, 2012 6:51 pm
by jbruyet
Hey jsmurphy, on my Nagios web page I have about four "screen heights" of scrolling to do in order to see everything. I've attached a screen shot of my Nagios page to give you an idea of what I'm talking about. AND, I couldn't get everything to show because Firefox won't zoom out far enough. I like the grouping but I have to scroll to see that everything is up. I'm working on changing the alert configuration because my current install constantly pumps out alerts to me for machines that are in an Event Status (like someone on vacation). Until then I've taken to looking at the screen from time to time to make sure everything is good. Like I just discovered that our web site it down. Any suggestions? I hear people talk about monitoring MANY MORE workstations than I am at this time and I'm wondering how they do it.

Thanks,

Joe B

Re: Monitor sub-groups of computers and not create host grou

Posted: Wed Aug 15, 2012 7:07 pm
by jbruyet
OOOOOOHHHHHH!!!!!!! Are you thinking Tactical Overview? That won't work because I'm never all green; there's always someone whose computer is off or a server with minimal free memory (SQL) or a workstation/server at the far end of our VLAN with a high enough ping RTA to trigger an alert or etc... That's one of the reasons I want to do some reconfiguring with Hostgroups -- to group devices with similar, higher alert trigger points and move them away from devices with similar, lower alert trigger points (and reconfiguring Nagios' vigorous default alerting system as well). Does this help make things clearer or murkier? This situation is another reason I was asking in a different post about Nagios Best Practices.

Thanks,

Joe B

Re: Monitor sub-groups of computers and not create host grou

Posted: Thu Aug 16, 2012 7:11 pm
by jsmurphy
Much clearer :), the way most people do it (as far as I know) is just to use the root problems page to display down hosts and services rather than try to see everything. The thinking is "who cares if something is alive?", I don't have to act on a machine that's working correctly, so I only want to see the machines that do require me to act. Your situation is a bit more unique so you would probably want to play with the status CGI URL to display services down when host is up.

I vaguely remember the best practice post... my talk at the Nagios conference this year is sort of about best practices, but I avoid using that term as it's sort of a dirty word when it comes to Nagios. I believe there are good models for approaching configuration but the best practice is really any implementation that scales and does the job you want in your environment without causing you undue grief.

I feel like you may be falling into a classic trap of overly judicious monitoring, that is to say you have one host close to the nagios server with an rta of 10 and maybe the one far away has an rta of 50 and occasionally it bumps up to 70... you've got your monitoring threshold set to 60 so it's triggering a warning when it bumps over. But was there any impact? If the rta increased to 100 for either device is there any impact? How about 150?

I guess the point I am trying to make is set the thresholds at values where you begin to see impact and you will find those values are likely to be more ubiquitous across all your devices. I don't know if any of this actually helps, probably not, it's all simple answers to complex questions :p

Re: Monitor sub-groups of computers and not create host grou

Posted: Thu Aug 16, 2012 8:54 pm
by jbruyet
jsmurphy wrote:Much clearer :), the way most people do it (as far as I know) is just to use the root problems page to display down hosts and services rather than try to see everything. The thinking is "who cares if something is alive?", I don't have to act on a machine that's working correctly, so I only want to see the machines that do require me to act. Your situation is a bit more unique so you would probably want to play with the status CGI URL to display services down when host is up.
Ahhhh, that's a really good point; I guess I just like to see the green. I'm only concerned about the "problems" that pop up from time to time so I may look into the "status CGI URL," or I may just stick with the "root problems page" and leave it at that.
jsmurphy wrote:I vaguely remember the best practice post... my talk at the Nagios conference this year is sort of about best practices, but I avoid using that term as it's sort of a dirty word when it comes to Nagios. I believe there are good models for approaching configuration but the best practice is really any implementation that scales and does the job you want in your environment without causing you undue grief.

I feel like you may be falling into a classic trap of overly judicious monitoring, that is to say you have one host close to the nagios server with an rta of 10 and maybe the one far away has an rta of 50 and occasionally it bumps up to 70... you've got your monitoring threshold set to 60 so it's triggering a warning when it bumps over. But was there any impact? If the rta increased to 100 for either device is there any impact? How about 150?
I understand that something that's a best practice for someone else may not be a best practice for me. It's ALL going to be (well, already is) application-centered, and I'm referring to the "application" of Nagios to each person's network. I'm pretty sure I could step back two or three steps and not lose anything.
jsmurphy wrote:I guess the point I am trying to make is set the thresholds at values where you begin to see impact and you will find those values are likely to be more ubiquitous across all your devices. I don't know if any of this actually helps, probably not, it's all simple answers to complex questions :p
I will start by rethinking my threshold values. Sometimes answers may not directly resolve a problem, but I feel like now I have a better perspective of what I'm trying to do. Or maybe it's more like an attitude adjustment. Thanks very much for the enlightenment.

A Padawan Learner,

Joe B

Re: Monitor sub-groups of computers and not create host grou

Posted: Sun Aug 19, 2012 6:29 pm
by jsmurphy
Glad you got something out of it ;)