I cannot BELIEVE the progress I am making. I am actually quite shocked really. In truth, the biggest mistake ANY n00b can make is to erase the templates provided by Nagios.
So, what made this time so different than the other attempts you ask? Well, I went into every single .cfg file in the /etc/nagios3/conf.d directory and copied each line of each file and pasted it into a good ole'fashioned word file. Then, I separated them out by the file name and highlighted each area underneath the respective file name. This provided me with three MAIN things:
First, it enabled me to VISUALLY see the lines of config vs. what is being displayed on-screen, thus, allowing me to see which files control what.
Second, it allows one to see how the lines of configuration interact with one another and it's dependencies. This really is helpful in that you can see what files are required and which aren't.
and lastly, and what has wrecked havoc on me in the past, is it provided me a fail safe where if I completely screw up the configs, I can just copy and paste back the originals to all of them and start all over from "factory default"
As I promised to all of you, I planned to pay it forward to all of the other n00b's that come along. That said, I have attached the word doc I created to this message to save all of the hassle it took to copy and paste it all into one area.
OK, on to the next stage - Best practices, ease of use.
Few questions I have to continue this thread.
1) First, should I group hosts that are of the same task under on .cfg file? i.e. - POS Hosts.cfg, District Servers.cfg, etc.....
2) If I want all of them to have an active ping check every 30 seconds, how do I accomplish that? Currently, my generic-host_nagios2.cfg file looks like this:
# Generic host definition template - This is NOT a real host, just a template!
define host{
name generic-host ; The name of this host template
notifications_enabled 1 ; Host notifications are enabled
event_handler_enabled 1 ; Host event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
failure_prediction_enabled 1 ; Failure prediction is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
check_command check-host-alive
max_check_attempts 10
notification_interval 0
notification_period 24x7
notification_options d,u,r
contact_groups admins
register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}
Is the check-host-alive the same as ping? Is this the file to manipulate or should I created another similar file?
3) Is there an easier way of just grouping all hosts into one "member" rather than typing each member to a hostgroup or any other grouping type?
I think that is all for now....
Progress in the near future all!!!
