Q&A for host/service/command best practices

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
AGray
Posts: 15
Joined: Wed May 29, 2013 1:34 pm

Q&A for host/service/command best practices

Post by AGray »

I am new to Nagios by about 2 weeks. I read a lot of threads online and decided the only way I was going to know how to implement this in our environment was to get my hands dirty and likely have to re-visit how we implemented it after I know more.

So what I have done so far is, I have a cfg_dir defined where I store all my .cfg files.
Hosts/Groups are defined in one file. I have 150 hosts so far and many defined groups
Services are defined in another file.
Commands are defined in another file.

I re-use the same services for entire host groups. This seemed like a great idea initially, but what I have experienced is that the arguments I pass to the command via the service vary with many hosts.

Now I got to wondering would I be better off having a separate hostname.cfg and perhaps hostname-services.cfg and hostname-commands.cfg (basically hostname-{X}.cfg for the various needs I have)? I could easily create some template files and a shell script to on-board a new host. I just dont know if this is an approach that is used by others out there successfully? I also was thinking that I couldn't use the same service description as that must be unique. So that means likely thousands of services that are almost dupes except for the arguments I pass in that service are slightly different (ie: thresholds for warning vs critical are different for host-A vs host-B).

I was thinking I could almost re-use the same services and commands as I am doing now *IF* there was a way to define the arguments the service passes to the commands as variables, specified on the host level?

I know my concerns make sense in my head, but it is hard to convey to you guys. Hopefully some of this makes sense to you guys and hopefully you can help me understand how I can implement this with the most flexibility, ease of use, etc.
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Q&A for host/service/command best practices

Post by slansing »

While we generally recommend leaving the directory structure and config file names at their default states due to troubleshooting issues, in this current setting I would recommend choosing a special naming convention for your infrastructure that will allow you to be flexible with your services "so as not to end up titling them relatively all the same. There really is no best practice for this as the beauty of Nagios is that you can really extend, or modify just about everything you deal with inside the system. Though most of these questions are addressed in Nagios XI "such as bulk modifications, and cloning of hosts/services while offering the ability to change their host address, etc.." this must be done manually in Nagios Core.
AGray
Posts: 15
Joined: Wed May 29, 2013 1:34 pm

Re: Q&A for host/service/command best practices

Post by AGray »

I had thought that I could come up with a naming convention for the services that is easy to follow and implement. It had me thinking though that I will end up with hundreds of possibly approaching a thousand services. Technically the same number of commands being ran though as the number of hosts hasnt increased. Is this known to be an issue with load? I would have thought the load would be the number of commands being ran and the load on the host being monitored, but perhaps having hundreds or many thousands of commands can be a known limitation on load or something?
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Q&A for host/service/command best practices

Post by slansing »

It all depends on what type of checks "active/passive" how often they are running, and what type of data they are pulling.. for instance and active SNMP check pulling router bandwidth along with performance data naturally takes more load as that is one of the more intensive checks. Active checks do take more load since they are being triggered by Nagios, where passive checks are all initiated by the remote host you are monitoring then send their data to Naigos, while Nagios does take a small load hit from this, it is significantly less. You are well under any sort of limitation for your Nagios system, assuming it has a moderate 4 core processor, 6+ GB of memory, and a decent sized drive setup.

Of course, other things will eat load on your server as Nagios is only one piece of software running on the linux server.
AGray
Posts: 15
Joined: Wed May 29, 2013 1:34 pm

Re: Q&A for host/service/command best practices

Post by AGray »

My point being more to the effect of the number of services defined even if they are not actually executed, is there any limitation in nagios? Let's say I only check each service once a week but I happen to define 100,000 services (Which I wont, this is an extreme example), is there something in Nagios that has a maximum number of services that the app can read in as it starts? I am still keeping my load down on number of active checks , there would just be a lot more services defined. When someone looks in the GUI, they wont really want to look at the Services anymore, but instead look at the Host and then the services defined for that host, that is how I would envision it. Also basic services that has no need for customizaton on a per-host level I will probably associate with a host group.

thank you again for your help. I know i seem ignorant, I'm learning...
slansing
Posts: 7698
Joined: Mon Apr 23, 2012 4:28 pm
Location: Travelling through time and space...

Re: Q&A for host/service/command best practices

Post by slansing »

You don't seem ignorant at all, you are hitting all the points we use when suggesting admin's use logical hostgroups, and grouping strategies. The only limitation with Nagios is your system, and how many threads, forks, etc nagios is opening with other processes like apache, or mysqld and ndo2db. Some companies have well over 100,000 service checks running, some are mostly active checks at that.
Locked