Page 4 of 6
Re: Nagios and off site Windows monitoring
Posted: Thu Jul 30, 2015 3:35 pm
by jdalrymple
Jam1987 wrote:Error: Service check command 'check_nrpe!alias_cpu' specified in service 'CPU Load' for host 'windowshost' not defined anywhere!
Error: Service check command 'check_nrpe!alias_disk' specified in service 'Free Space' for host 'windowshost' not defined anywhere!
That is awkward, can you post the service definitions? ! should be interpreted as a separator (obviously)
Re: Nagios and off site Windows monitoring
Posted: Thu Jul 30, 2015 3:45 pm
by Jam1987
Here is my whole windows.cfg file. I tried to keep it simple.
Code: Select all
define host{
use tpl-windows-servers ; Inherit default values from a template
host_name windowshost ; The name we're giving to this server
alias My First Windows Server ; A longer name for the server
address 10.0.0.2 ; IP address of the server
active_checks_enabled 0 ; Active host checks are enabled
passive_checks_enabled 1 ; Passive host checks are enabled/accepted
}
###############################################################################
###############################################################################
#
# HOST GROUP DEFINITIONS
#
###############################################################################
###############################################################################
# Define a hostgroup for Windows machines
# All hosts that use the windows-server template will automatically be a member of this group
define hostgroup{
hostgroup_name windows-servers ; The name of the hostgroup
alias Windows Servers ; Long name of the group
}
###############################################################################
define service{
use generic-service
host_name windowshost
service_description CPU Load
check_command check_nrpe!alias_cpu
active_checks_enabled 0 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
}
define service{
use generic-service
host_name windowshost
service_description Free Space
check_command check_nrpe!alias_disk
active_checks_enabled 0 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
}
I'm following this tutorial:
http://docs.nsclient.org/tutorial/nagios/nsca.html
Re: Nagios and off site Windows monitoring
Posted: Thu Jul 30, 2015 3:46 pm
by tgriep
I just want to clarify, do you want to do passive only checks of the windows system or active and passive checks of the windows system?
To fix your check_nrpe error, which is an active check. You can define the check_nrpe command as follows.
Code: Select all
define command {
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ $ARG2$
}
To get the passive checks, you would have setup the Nagios System checks. I am running Nagios XI so the examples below might have to be edited to work on your system.
You would have to create a passive_host template similar to below
Code: Select all
define host {
name passive_host
check_command check_dummy!0!"No data received yet."
use generic_host
max_check_attempts 1
active_checks_enabled 0
passive_checks_enabled 1
register 0
}
Then a passive_service template
Code: Select all
define service {
name passive_service
service_description Passive Service
use generic_service
check_command check_dummy!0!"No data received yet."
is_volatile 0
initial_state o
max_check_attempts 1
active_checks_enabled 0
passive_checks_enabled 1
flap_detection_enabled 0
stalking_options o,w,u,c
register 0
}
Then the host check for your windows system
Code: Select all
define host {
host_name win_host
use passive_host
address win_host
max_check_attempts 5
check_interval 5
retry_interval 1
check_period 24x7
contacts nagiosadmin
notification_interval 60
notification_period 24x7
register 1
}
Then the Service checks for the windows hosts.
Code: Select all
define service {
host_name win_host
service_description cpu
use passive_service
max_check_attempts 1
check_interval 1
retry_interval 1
check_period 24x7
notification_interval 60
notification_period 24x7
contacts nagiosadmin
stalking_options n
register 1
}
define service {
host_name win_host
service_description disk
use passive_service
max_check_attempts 1
check_interval 1
retry_interval 1
check_period 24x7
notification_interval 60
notification_period 24x7
contacts nagiosadmin
stalking_options n
register 1
}
define service {
host_name win_host
service_description mem
use passive_service
max_check_attempts 1
check_interval 1
retry_interval 1
check_period 24x7
notification_interval 60
notification_period 24x7
contacts nagiosadmin
stalking_options n
register 1
}
define service {
host_name win_host
service_description service
use passive_service
max_check_attempts 1
check_interval 1
retry_interval 1
check_period 24x7
notification_interval 60
notification_period 24x7
contacts nagiosadmin
stalking_options n
register 1
}
You may have to tweak them to your system but give it a try and add them one at a time and verify that the configs work.
Good Luck.
Re: Nagios and off site Windows monitoring
Posted: Thu Jul 30, 2015 3:57 pm
by Jam1987
Passive only for our windows clients, I just want to monitor that they are online and get alerts if any of them go down. Thanks for the configs I will set them up and test them like crazy.
Re: Nagios and off site Windows monitoring
Posted: Thu Jul 30, 2015 4:30 pm
by tgriep
OK, then you can delete the check_nrpe command from your windows.cfg file, that will fix that issue.
Re: Nagios and off site Windows monitoring
Posted: Wed Aug 05, 2015 2:09 pm
by Jam1987
Hello gents,
I've been trying to get this done without bothering you guys again but I have another quandary for you. I have successfully got the configs tgriep gave me in and Nagios reboots without posting any errors and starts up fine. Once I got them done I left it for a day running as I was out of the office, I came back and to my disappointment the passive check is not green and hasn't changed since I ran the service. I checked the /var/nagios.log log and noticed the following:
Code: Select all
[1438802189] Error: Template 'generic_host' specified in host definition could not be not found (config file '/usr/local/nagios/etc/objects/templates.cfg', starting on line 141)
[1438802189] Error: Template 'generic_service' specified in service definition could not be not found (config file '/usr/local/nagios/etc/objects/templates.cfg', starting on line 205)
I've gone through them and I guess it's my lack of understanding but I have the template listed and the service listed fine in the templates.cfg file but I'm unsure why Nagios can't see them. Am I missing something?
Re: Nagios and off site Windows monitoring
Posted: Thu Aug 06, 2015 8:52 am
by tgriep
Sorry about that, they should be generic-host and generic-service with dashes and not under scores.
Re: Nagios and off site Windows monitoring
Posted: Thu Aug 06, 2015 10:23 am
by Jam1987
tgriep wrote:Sorry about that, they should be generic-host and generic-service with dashes and not under scores.
Oh jez don't be sorry I should have being paying more attention. I have made the changes and that error now no longer appears and Nagios boots fine. I guess I got to play the waiting game now for it to work. It's been about 3 mins now but the settings are set to 5 min intervals. Hopefully I get a green line soon!
Re: Nagios and off site Windows monitoring
Posted: Thu Aug 06, 2015 11:02 am
by hsmith
Jam1987 wrote:tgriep wrote:Sorry about that, they should be generic-host and generic-service with dashes and not under scores.
Oh jez don't be sorry I should have being paying more attention. I have made the changes and that error now no longer appears and Nagios boots fine. I guess I got to play the waiting game now for it to work. It's been about 3 mins now but the settings are set to 5 min intervals. Hopefully I get a green line soon!
Thank you! Let us know either way.
Re: Nagios and off site Windows monitoring
Posted: Thu Aug 06, 2015 12:44 pm
by Jam1987
Bad news everyone! instead of the opportune good news every one! (Prof. Farnsworth) I'm still getting nothing, I've checked tcpdump port 5667 and my test unit is sending info to the server:
Code: Select all
13:33:51.175584 IP GAWArena.49807 > storage.nsca: Flags [S], seq 3050019611, win 8192, options [mss 1460,nop,wscale 2,nop,nop,sackOK], length 0
13:33:51.175655 IP storage.nsca > GAWArena.49807: Flags [S.], seq 511529549, ack 3050019612, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
13:33:51.176458 IP GAWArena.49807 > storage.nsca: Flags [.], ack 1, win 16425, length 0
13:33:51.177569 IP storage.nsca > GAWArena.49807: Flags [P.], seq 1:133, ack 1, win 229, length 132
13:33:51.179452 IP GAWArena.49807 > storage.nsca: Flags [P.], seq 1:721, ack 133, win 16392, length 720
13:33:51.179481 IP storage.nsca > GAWArena.49807: Flags [.], ack 721, win 240, length 0
13:33:51.179496 IP GAWArena.49807 > storage.nsca: Flags [F.], seq 721, ack 133, win 16392, length 0
13:33:51.179568 IP storage.nsca > GAWArena.49807: Flags [F.], seq 133, ack 722, win 240, length 0
13:33:51.180342 IP GAWArena.49807 > storage.nsca: Flags [.], ack 134, win 16392, length 0
9 packets captured
9 packets received by filter
0 packets dropped by kernel
My unit config, windows.cfg has the same host name:
Code: Select all
define host {
host_name GAWArena
use passive_host
address GAWArena
max_check_attempts 5
check_interval 5
retry_interval 1
check_period 24x7
contacts nagiosadmin
notification_interval 60
notification_period 24x7
register 1
}
define service {
host_name GAWArena
service_description cpu
use passive_service
max_check_attempts 1
check_interval 1
retry_interval 1
check_period 24x7
notification_interval 60
notification_period 24x7
contacts nagiosadmin
stalking_options n
register 1
}
As far as I've read the host_name needs to be the same across the board so it knows who to pick, which I have done. All the great configs that were sent earlier are in and running but the nagios.log doesn't report back as much info as I thought it would and /var/log doesn't seem to have any proper logs to show what's happening. Is there a better log to view to see if nagios even notices the windows unit sending info?
Sorry to be a pain with all these problems!
Hopefully when all this is up and running I'll be uploading every config file in Nagios and NSClient++ so others can easily do the same thing!