Page 1 of 2
Nagios reporting same results for all hosts
Posted: Thu Jan 09, 2014 12:20 pm
by SledWrecker
I have nagios core 4.0.2 installed on Linux - Scientific 6.4. I have about 70 RedHat workstation and server clients added to the hosts and localhosts.cfg files. I have all my services defined for each host and everything looks like it's supposed to.
The problem I'm having is all my metrics are the same, I believe it is reporting the stats from the local nagios server hosting nagios core. I can't seem to figure this out, I've been through all my configuration files numerous times and I pulled in another sys admin who has this working at his site and he can't figure it out either. I'm sure it's a very simple fix I'm just missing something and being relatively new to Linux this process has been very painful with a very steep learning curve.
Any direction would be greatly appreciated.

Re: Nagios reporting same results for all hosts
Posted: Thu Jan 09, 2014 12:27 pm
by slansing
Can you share one of your Host and Service configuration files with us that is showing the same metrics? We are happy to help!

Re: Nagios reporting same results for all hosts
Posted: Thu Jan 09, 2014 2:28 pm
by SledWrecker
The files are quite lengthy I will give you a snipit.
Code: Select all
## Default linux Host Template ##
define host{
name linux-box ; Name of this template
use generic-host ; Inherit default values
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period 24x7
notification_interval 30
notification_options d,r
contact_groups admins
register 0 ; DONT REGISTER THIS - ITS A TEMPLATE
}
## Default
define host{
use linux-box ; Inherit default values from a template
host_name localhost ; The name we're giving to this server
alias Red Hat Enterprise Linux 5.9 ; A longer name for the server
address 127.0.0.1 ; IP address of Remote Linux host
}
## mnlcfmaster1 ##
#define host{
#use generic-host
#host_name mlcfmaster1
#alias mlcfmaster1
#address 11.115.14.63
#}
## mnlcfmaster2 ##
define host{
use generic-host
host_name mlcfmaster2
alias mlcfmaster2
address 11.115.14.64
}
and
Code: Select all
##############
## LOCALHOST ##
###############
define service{
use local-service ; Name of service template to use
host_name localhost
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use local-service ; Name of service template to use
host_name localhost
service_description Root Partition
check_command check_local_disk!20%!10%!/
}
define service{
use local-service ; Name of service template to use
host_name localhost
service_description Current Users
check_command check_local_users!20!50
}
define service{
use local-service ; Name of service template to use
host_name localhost
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
}
define service{
use local-service ; Name of service template to use
host_name localhost
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}
define service{
use local-service ; Name of service template to use
host_name localhost
service_description Swap Usage
check_command check_local_swap!20!10
}
define service{
use local-service ; Name of service template to use
host_name localhost
service_description SSH
check_command check_ssh
notifications_enabled 0
}
define service{
use local-service ; Name of service template to use
host_name localhost
service_description HTTP
check_command check_tcp!80
notifications_enabled 0
}
define service{
use generic-service
host_name localhost
service_description CPU Load
check_command check_nrpe!check_load
}
#define service{
# use generic-service
# host_name localhost
# service_description Total Processes
# check_command check_nrpe!check_total_procs
# }
define service{
use generic-service
host_name localhost
service_description Current Users
check_command check_nrpe!check_users
}
##################
## mlcfmaster1 ##
##################
define service{
use local-service ; Name of service template to use
host_name mnlcfmaster1
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use local-service ; Name of service template to use
host_name mnlcfmaster1
service_description Root Partition
check_command check_local_disk!20%!10%!/
}
define service{
use local-service ; Name of service template to use
host_name mnlcfmaster1
service_description Current Users
check_command check_local_users!20!50
}
define service{
use local-service ; Name of service template to use
host_name mnlcfmaster1
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
}
define service{
use local-service ; Name of service template to use
host_name mnlcfmaster1
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}
define service{
use local-service ; Name of service template to use
host_name mnlcfmaster1
service_description Swap Usage
check_command check_local_swap!20!10
}
define service{
use local-service ; Name of service template to use
host_name mnlcfmaster1
service_description SSH
check_command check_ssh
notifications_enabled 0
}
##################
## mlcfmaster2 ##
##################
define service{
use local-service ; Name of service template to use
host_name mnlcfmaster2
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use local-service ; Name of service template to use
host_name mnlcfmaster2
service_description Root Partition
check_command check_local_disk!20%!10%!/
}
define service{
use local-service ; Name of service template to use
host_name mnlcfmaster2
service_description Current Users
check_command check_local_users!20!50
}
define service{
use local-service ; Name of service template to use
host_name mnlcfmaster2
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
}
define service{
use local-service ; Name of service template to use
host_name mnlcfmaster2
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}
define service{
use local-service ; Name of service template to use
host_name mnlcfmaster2
service_description Swap Usage
check_command check_local_swap!20!10
}
define service{
use local-service ; Name of service template to use
host_name mnlcfmaster2
service_description SSH
check_command check_ssh
notifications_enabled 0
}
Also, I do have the /localhosts.cfg file configured with all the clinet machines and their IPs. However, in nagios.cfg I have this line commented out as follows. Without commenting this line out I cannot get the service to start / restart (errors)
Code: Select all
# OBJECT CONFIGURATION FILE(S)
# These are the object configuration files in which you define hosts,
# host groups, contacts, contact groups, services, etc.
# You can split your object definitions across several config files
# if you wish (as shown below), or keep them all in a single config file.
# You can specify individual object config files as shown below:
cfg_file=/usr/local/nagios/etc/objects/commands.cfg
cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/objects/templates.cfg
#cfg_file=/usr/local/nagios/etc/hosts.cfg
cfg_file=/usr/local/nagios/etc/services.cfg
# Definitions for monitoring the local (Linux) host
cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
# Definitions for monitoring a Windows machine
cfg_file=/usr/local/nagios/etc/objects/windows.cfg
# Definitions for monitoring a router/switch
#cfg_file=/usr/local/nagios/etc/objects/switch.cfg
# Definitions for monitoring a network printer
cfg_file=/usr/local/nagios/etc/objects/printer.cfg
Re: Nagios reporting same results for all hosts
Posted: Thu Jan 09, 2014 2:36 pm
by abrist
You are using the "local-service" templates for your remote system checks. The local-service template should only be used for localhost. You will need to use nrpe or another agent of your choice to check the remote systems. Use the "generic-service" template for the remote service checks as well.
http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf
Re: Nagios reporting same results for all hosts
Posted: Wed Jan 22, 2014 4:57 pm
by SledWrecker
abrist wrote:You are using the "local-service" templates for your remote system checks. The local-service template should only be used for localhost. You will need to use nrpe or another agent of your choice to check the remote systems. Use the "generic-service" template for the remote service checks as well.
http://nagios.sourceforge.net/docs/nrpe/NRPE.pdf
Ok I modified my templates and changed the local-service to generic-service. I also ensured on a couple clients that I can do a check_nrpe and I get the nagios version returned back to me. This points to nrpe being installed correctly on the remote linux client as I am doing the check_nrpe from the server.
Still... The machines are all reporting the localhost metrics. I did restart the service after making all the changes.
Any advice?
Re: Nagios reporting same results for all hosts
Posted: Wed Jan 22, 2014 5:16 pm
by slansing
Sorry to have to ask again, but can you get a fresh copy of one of the remote host configurations, including it's host definition and service definitions? Though they are lengthy we really need to take a look at a full example. Thanks a ton!
Re: Nagios reporting same results for all hosts
Posted: Tue Feb 04, 2014 11:15 am
by SledWrecker
Sorry it has taken so long for me to get back to everyone. Here are the examples. I sitll have not been able to get this working. I have nagios 4.0.2 fresh install on scientific 6.4 x64. Web front-end is up and running. the two hosts defined in my hosts file are configured with nrpe and I can do a check_nrpe on them and it pulls the version.
I can start nagios with no errors or when I verify nagios before starting the service I get no errors. You can see I am using the "generic-service" template now as well.
Stil.. All my metrics being reported in Nagios are all for the local host.
Code: Select all
[root@monitor02 etc]# /usr/local/nagios/libexec/check_nrpe -H 10.100.14.101
NRPE v2.15
This is my "hosts.cfg" file
Code: Select all
## Default Linux Host Template ##
define host{
name linux-box ; Name of this template
use generic-host ; Inherit default values
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period 24x7
notification_interval 30
notification_options d,r
contact_groups admins
register 0 ; DONT REGISTER THIS - ITS A TEMPLATE
}
## cfmaster1 ##
define host{
use linux-box
host_name cfmaster1
alias cfmaster1
address 10.100.14.63
}
## linux01 ##
define host{
use linux-box
host_name linux01
alias linux01
address 10.100.14.101
This is my "services.cfg" file
Code: Select all
################
# CFMASTER1 #
################
define service{
use generic-service ; Name of service template to use
host_name cfmaster1
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use generic-service ; Name of service template to use
host_name cfmaster1
service_description Root Partition
check_command check_local_disk!20%!10%!/
}
define service{
use generic-service ; Name of service template to use
host_name cfmaster1
service_description Current Users
check_command check_local_users!20!50
}
define service{
use generic-service ; Name of service template to use
host_name cfmaster1
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
}
define service{
use generic-service ; Name of service template to use
host_name cfmaster1
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}
define service{
use generic-service ; Name of service template to use
host_name cfmaster1
service_description Swap Usage
check_command check_local_swap!20!10
}
define service{
use generic-service ; Name of service template to use
host_name cfmaster1
service_description SSH
check_command check_ssh
notifications_enabled 0
}
#############
# LINUX01 #
#############
define service{
use generic-service ; Name of service template to use
host_name linux01
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use generic-service ; Name of service template to use
host_name linux01
service_description Root Partition
check_command check_local_disk!20%!10%!/
}
define service{
use generic-service ; Name of service template to use
host_name linux01
service_description Current Users
check_command check_local_users!20!50
}
define service{
use generic-service ; Name of service template to use
host_name linux01
service_description Total Processes
check_command check_local_procs!250!400!RSZDT
}
define service{
use generic-service ; Name of service template to use
host_name linux01
service_description Current Load
check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
}
define service{
use generic-service ; Name of service template to use
host_name mdlinux01
service_description Swap Usage
check_command check_local_swap!20!10
}
define service{
use generic-service ; Name of service template to use
host_name linux01
service_description SSH
check_command check_ssh
notifications_enabled 0
}
Thanks!
Re: Nagios reporting same results for all hosts
Posted: Tue Feb 04, 2014 3:05 pm
by SledWrecker
bump, thanks!
Re: Nagios reporting same results for all hosts
Posted: Tue Feb 04, 2014 3:23 pm
by SledWrecker
Doing some trial and error I figure out that any changes I make to localhost.cfg is reflected through all my remote hosts. So even though my service.cfg defince "generic-service" and my hosts.cfg defines "generic-hosts" it is still using the localhost.cfg file for all my hosts.
I'm not sure how to rectify this. Can someone please help?
Re: Nagios reporting same results for all hosts
Posted: Tue Feb 04, 2014 4:03 pm
by abrist
SledWrecker wrote: Can someone please help?
Yep. Your issue is that you are using the service checks "check_local_*" to check your remote systems. If you look at the command definitions, you will notice that the check_local_* commands all check local host. You need to remove the string "_local" from your service definitions for the remote hosts.
For example, change:
Code: Select all
define service{
use generic-service ; Name of service template to use
host_name cfmaster1
service_description Current Users
check_command check_local_users!20!50
}
To:
Code: Select all
define service{
use generic-service ; Name of service template to use
host_name cfmaster1
service_description Current Users
check_command check_users!20!50
}
Otherwise, all checks are run against localhost instead of the remote host. Warning here, do not change the check_local_* commands themselves though, otherwise your localhost checks will stop working. Just change the service definition check_command directive for the remote hosts.