Nagios and off site Windows monitoring

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Nagios and off site Windows monitoring

Post by jdalrymple »

Jam1987 wrote:Error: Service check command 'check_nrpe!alias_cpu' specified in service 'CPU Load' for host 'windowshost' not defined anywhere!
Error: Service check command 'check_nrpe!alias_disk' specified in service 'Free Space' for host 'windowshost' not defined anywhere!
That is awkward, can you post the service definitions? ! should be interpreted as a separator (obviously)
Jam1987
Posts: 54
Joined: Mon Jul 27, 2015 2:06 pm

Re: Nagios and off site Windows monitoring

Post by Jam1987 »

Here is my whole windows.cfg file. I tried to keep it simple.

Code: Select all

define host{
      use             tpl-windows-servers ; Inherit default values from a template
      host_name       windowshost ; The name we're giving to this server
      alias           My First Windows Server ; A longer name for the server
      address         10.0.0.2 ; IP address of the server
      active_checks_enabled   0 ; Active host checks are enabled
      passive_checks_enabled  1 ; Passive host checks are enabled/accepted
}

###############################################################################
###############################################################################
#
# HOST GROUP DEFINITIONS
#
###############################################################################
###############################################################################


# Define a hostgroup for Windows machines
# All hosts that use the windows-server template will automatically be a member of this group

define hostgroup{
        hostgroup_name  windows-servers ; The name of the hostgroup
        alias           Windows Servers ; Long name of the group
        }

###############################################################################

define service{
      use                     generic-service
      host_name               windowshost
      service_description     CPU Load
      check_command           check_nrpe!alias_cpu
      active_checks_enabled   0 ; Active service checks are enabled
      passive_checks_enabled  1 ; Passive service checks are enabled/accepted
}

define service{
      use                     generic-service
      host_name               windowshost
      service_description     Free Space
      check_command           check_nrpe!alias_disk
      active_checks_enabled   0 ; Active service checks are enabled
      passive_checks_enabled  1 ; Passive service checks are enabled/accepted
}
I'm following this tutorial:

http://docs.nsclient.org/tutorial/nagios/nsca.html
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios and off site Windows monitoring

Post by tgriep »

I just want to clarify, do you want to do passive only checks of the windows system or active and passive checks of the windows system?

To fix your check_nrpe error, which is an active check. You can define the check_nrpe command as follows.

Code: Select all

define command {
       command_name                  		check_nrpe
       command_line                  		$USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ $ARG2$
}
To get the passive checks, you would have setup the Nagios System checks. I am running Nagios XI so the examples below might have to be edited to work on your system.

You would have to create a passive_host template similar to below

Code: Select all

define host {
       name                          		passive_host
       check_command                 		check_dummy!0!"No data received yet."
       use                           		generic_host
       max_check_attempts            		1
       active_checks_enabled         		0
       passive_checks_enabled        		1
       register                    		0
}	
Then a passive_service template

Code: Select all

define service {
       name                          		passive_service
       service_description           		Passive Service
       use                           		generic_service
       check_command                 		check_dummy!0!"No data received yet."
       is_volatile                   		0
       initial_state                 		o
       max_check_attempts            		1
       active_checks_enabled         		0
       passive_checks_enabled        		1
       flap_detection_enabled        		0
       stalking_options              		o,w,u,c
       register                    		0
}	
Then the host check for your windows system

Code: Select all

define host {
	host_name			win_host
	use				passive_host
	address				win_host
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	check_period			24x7
	contacts			nagiosadmin
	notification_interval		60
	notification_period		24x7
	register			1
	}
Then the Service checks for the windows hosts.

Code: Select all

define service {
	host_name			win_host
	service_description		cpu
	use				passive_service
	max_check_attempts		1
	check_interval			1
	retry_interval			1
	check_period			24x7
	notification_interval		60
	notification_period		24x7
	contacts			nagiosadmin
	stalking_options		n
	register			1
	}	

define service {
	host_name			win_host
	service_description		disk
	use				passive_service
	max_check_attempts		1
	check_interval			1
	retry_interval			1
	check_period			24x7
	notification_interval		60
	notification_period		24x7
	contacts			nagiosadmin
	stalking_options		n
	register			1
	}	

define service {
	host_name			win_host
	service_description		mem
	use				passive_service
	max_check_attempts		1
	check_interval			1
	retry_interval			1
	check_period			24x7
	notification_interval		60
	notification_period		24x7
	contacts			nagiosadmin
	stalking_options		n
	register			1
	}	

define service {
	host_name			win_host
	service_description		service
	use				passive_service
	max_check_attempts		1
	check_interval			1
	retry_interval			1
	check_period			24x7
	notification_interval		60
	notification_period		24x7
	contacts			nagiosadmin
	stalking_options		n
	register			1
	}
You may have to tweak them to your system but give it a try and add them one at a time and verify that the configs work.
Good Luck.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Jam1987
Posts: 54
Joined: Mon Jul 27, 2015 2:06 pm

Re: Nagios and off site Windows monitoring

Post by Jam1987 »

Passive only for our windows clients, I just want to monitor that they are online and get alerts if any of them go down. Thanks for the configs I will set them up and test them like crazy.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios and off site Windows monitoring

Post by tgriep »

OK, then you can delete the check_nrpe command from your windows.cfg file, that will fix that issue.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Jam1987
Posts: 54
Joined: Mon Jul 27, 2015 2:06 pm

Re: Nagios and off site Windows monitoring

Post by Jam1987 »

Hello gents,

I've been trying to get this done without bothering you guys again but I have another quandary for you. I have successfully got the configs tgriep gave me in and Nagios reboots without posting any errors and starts up fine. Once I got them done I left it for a day running as I was out of the office, I came back and to my disappointment the passive check is not green and hasn't changed since I ran the service. I checked the /var/nagios.log log and noticed the following:

Code: Select all

[1438802189] Error: Template 'generic_host' specified in host definition could not be not found (config file '/usr/local/nagios/etc/objects/templates.cfg', starting on line 141)
[1438802189] Error: Template 'generic_service' specified in service definition could not be not found (config file '/usr/local/nagios/etc/objects/templates.cfg', starting on line 205)
I've gone through them and I guess it's my lack of understanding but I have the template listed and the service listed fine in the templates.cfg file but I'm unsure why Nagios can't see them. Am I missing something?
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Nagios and off site Windows monitoring

Post by tgriep »

Sorry about that, they should be generic-host and generic-service with dashes and not under scores.
Be sure to check out our Knowledgebase for helpful articles and solutions!
Jam1987
Posts: 54
Joined: Mon Jul 27, 2015 2:06 pm

Re: Nagios and off site Windows monitoring

Post by Jam1987 »

tgriep wrote:Sorry about that, they should be generic-host and generic-service with dashes and not under scores.
Oh jez don't be sorry I should have being paying more attention. I have made the changes and that error now no longer appears and Nagios boots fine. I guess I got to play the waiting game now for it to work. It's been about 3 mins now but the settings are set to 5 min intervals. Hopefully I get a green line soon!
User avatar
hsmith
Agent Smith
Posts: 3539
Joined: Thu Jul 30, 2015 11:09 am
Location: 127.0.0.1
Contact:

Re: Nagios and off site Windows monitoring

Post by hsmith »

Jam1987 wrote:
tgriep wrote:Sorry about that, they should be generic-host and generic-service with dashes and not under scores.
Oh jez don't be sorry I should have being paying more attention. I have made the changes and that error now no longer appears and Nagios boots fine. I guess I got to play the waiting game now for it to work. It's been about 3 mins now but the settings are set to 5 min intervals. Hopefully I get a green line soon!
Thank you! Let us know either way.
Former Nagios Employee.
me.
Jam1987
Posts: 54
Joined: Mon Jul 27, 2015 2:06 pm

Re: Nagios and off site Windows monitoring

Post by Jam1987 »

Bad news everyone! instead of the opportune good news every one! (Prof. Farnsworth) I'm still getting nothing, I've checked tcpdump port 5667 and my test unit is sending info to the server:

Code: Select all

13:33:51.175584 IP GAWArena.49807 > storage.nsca: Flags [S], seq 3050019611, win 8192, options [mss 1460,nop,wscale 2,nop,nop,sackOK], length 0
13:33:51.175655 IP storage.nsca > GAWArena.49807: Flags [S.], seq 511529549, ack 3050019612, win 29200, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
13:33:51.176458 IP GAWArena.49807 > storage.nsca: Flags [.], ack 1, win 16425, length 0
13:33:51.177569 IP storage.nsca > GAWArena.49807: Flags [P.], seq 1:133, ack 1, win 229, length 132
13:33:51.179452 IP GAWArena.49807 > storage.nsca: Flags [P.], seq 1:721, ack 133, win 16392, length 720
13:33:51.179481 IP storage.nsca > GAWArena.49807: Flags [.], ack 721, win 240, length 0
13:33:51.179496 IP GAWArena.49807 > storage.nsca: Flags [F.], seq 721, ack 133, win 16392, length 0
13:33:51.179568 IP storage.nsca > GAWArena.49807: Flags [F.], seq 133, ack 722, win 240, length 0
13:33:51.180342 IP GAWArena.49807 > storage.nsca: Flags [.], ack 134, win 16392, length 0
9 packets captured
9 packets received by filter
0 packets dropped by kernel
My unit config, windows.cfg has the same host name:

Code: Select all

define host {
   host_name         GAWArena
   use            passive_host
   address           GAWArena
   max_check_attempts      5
   check_interval         5
   retry_interval         1
   check_period         24x7
   contacts         nagiosadmin
   notification_interval      60
   notification_period      24x7
   register         1
   }


define service {
   host_name         GAWArena
   service_description      cpu
   use            passive_service
   max_check_attempts      1
   check_interval         1
   retry_interval         1
   check_period         24x7
   notification_interval      60
   notification_period      24x7
   contacts         nagiosadmin
   stalking_options      n
   register         1
}
As far as I've read the host_name needs to be the same across the board so it knows who to pick, which I have done. All the great configs that were sent earlier are in and running but the nagios.log doesn't report back as much info as I thought it would and /var/log doesn't seem to have any proper logs to show what's happening. Is there a better log to view to see if nagios even notices the windows unit sending info?

Sorry to be a pain with all these problems!

Hopefully when all this is up and running I'll be uploading every config file in Nagios and NSClient++ so others can easily do the same thing!
Locked