Page 3 of 4

Re: Duration resets for all Services with Apply Config

Posted: Tue Jul 07, 2015 11:26 am
by SavaSC
Ha - sorry about that. I didn't bother looking at the date.
Yea, I didn't either until after the fact. :D

I have renamed both of those log files. I am assuming that Nagios will recreate them.

The Nagios.log file is only showing the service checks with almost nothing else. Nothing looks wrong or gives a failure or error.

Re: Duration resets for all Services with Apply Config

Posted: Tue Jul 07, 2015 11:33 am
by jdalrymple
grep for "Nagios <version> starting" and look in that general area for errors:

Code: Select all

[1436286677] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
[1436286678] Nagios 4.0.8 starting... (PID=41172)
[1436286678] Local time is Tue Jul 07 11:31:18 CDT 2015
[1436286678] LOG VERSION: 2.0
[1436286678] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1436286678] qh: core query handler registered
[1436286678] nerd: Channel hostchecks registered successfully
[1436286678] nerd: Channel servicechecks registered successfully
[1436286678] nerd: Channel opathchecks registered successfully
[1436286678] nerd: Fully initialized and ready to rock!
[1436286678] wproc: Successfully registered manager as @wproc with query handler
Also, take a look for any errors in /var/log/mysqld.log

Re: Duration resets for all Services with Apply Config

Posted: Tue Jul 07, 2015 2:42 pm
by SavaSC
Here is the last section on a reboot from the nagios.log. This is from shutdown till the service alerts start.

Code: Select all

[1436297400] Auto-save of retention data completed successfully.
[1436297523] Caught SIGTERM, shutting down...
[1436297523] Successfully shutdown... (PID=28873)
[1436297523] Event broker module 'NERD' deinitialized successfully.
[1436297523] ndomod: Shutdown complete.
[1436297523] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
[1436297524] Nagios 4.0.8 starting... (PID=13744)
[1436297524] Local time is Tue Jul 07 14:32:04 CDT 2015
[1436297524] LOG VERSION: 2.0
[1436297524] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1436297524] qh: core query handler registered
[1436297524] nerd: Channel hostchecks registered successfully
[1436297524] nerd: Channel servicechecks registered successfully
[1436297524] nerd: Channel opathchecks registered successfully
[1436297524] nerd: Fully initialized and ready to rock!
[1436297524] wproc: Successfully registered manager as @wproc with query handler
[1436297524] wproc: Registry request: name=Core Worker 13745;pid=13745
[1436297524] wproc: Registry request: name=Core Worker 13746;pid=13746
[1436297524] wproc: Registry request: name=Core Worker 13750;pid=13750
[1436297524] wproc: Registry request: name=Core Worker 13749;pid=13749
[1436297524] wproc: Registry request: name=Core Worker 13747;pid=13747
[1436297524] wproc: Registry request: name=Core Worker 13751;pid=13751
[1436297524] ndomod: NDOMOD 2.0.0 (02-28-2014) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1436297524] ndomod: Successfully connected to data sink.  0 queued items to flush.
[1436297524] ndomod registered for process data
[1436297524] ndomod registered for log data'
[1436297524] ndomod registered for system command data'
[1436297524] ndomod registered for event handler data'
[1436297524] ndomod registered for notification data'
[1436297524] ndomod registered for comment data'
[1436297524] ndomod registered for downtime data'
[1436297524] ndomod registered for flapping data'
[1436297524] ndomod registered for program status data'
[1436297524] ndomod registered for host status data'
[1436297524] ndomod registered for service status data'
[1436297524] ndomod registered for adaptive program data'
[1436297524] ndomod registered for adaptive host data'
[1436297524] ndomod registered for adaptive service data'
[1436297524] ndomod registered for external command data'
[1436297524] ndomod registered for aggregated status data'
[1436297524] ndomod registered for retention data'
[1436297524] ndomod registered for contact data'
[1436297524] ndomod registered for contact notification data'
[1436297524] ndomod registered for acknowledgement data'
[1436297524] ndomod registered for state change data'
[1436297524] ndomod registered for contact status data'
[1436297524] ndomod registered for adaptive contact data'
[1436297524] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1436297524] Warning: Host 'www.google.com' has no default contacts or contactgroups defined!
[1436297524] Successfully launched command file worker with pid 13756
The /var/logs/mysqld.log file is quite large. I have renamed it in order to get Nagios to recreate it. I will wait until tomorrow and then look to see what errors have been thrown.

Re: Duration resets for all Services with Apply Config

Posted: Wed Jul 08, 2015 8:59 am
by jdalrymple
SavaSC wrote:The /var/logs/mysqld.log file is quite large.
Typically that's not a good thing - we'll wait to see whatcha got.

Re: Duration resets for all Services with Apply Config

Posted: Wed Jul 08, 2015 11:28 am
by SavaSC
The /var/log/mysqld.log did not recreate itself. Any suggestions?

Re: Duration resets for all Services with Apply Config

Posted: Wed Jul 08, 2015 11:57 am
by jolson
You may need to restart mysqld for that log file to be regenerated.

Code: Select all

service mysqld restart
On another note, the only difference between your nagios.cfg and the default install is your defined flap thresholds.

Default install:

Code: Select all

low_host_flap_threshold=5.0
low_service_flap_threshold=5.0
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
Your install:

Code: Select all

low_host_flap_threshold=30.0
low_service_flap_threshold=30.0
high_host_flap_threshold=50.0
high_service_flap_threshold=50.0
I doubt that your flapping thresholds are responsible for this behavior, though if you wouldn't mind changing the values to default and restarting nagios I would be interested in your results.

Was your retention.dat file regenerated properly after it was deleted?

Code: Select all

ls -l /usr/local/nagios/var/retention.dat
If you stop the nagios process, does retention.dat persist? Retention.dat is responsible for keeping track of your service states, which affects your duration value.

Re: Duration resets for all Services with Apply Config

Posted: Wed Jul 08, 2015 2:25 pm
by SavaSC
I restarted the MySQL service and it did recreate the file. There's not really anything in it yet.
I have also changed the flap settings and restarted the Nagios service. Still doing the same thing.
The Rentention.dat file does persist. I deleted it and applied config. It came back. I stopped the Nagios service and it was still there.

Oddly, the only services that seem to not lose their timing are all the Novell servers and one external HTTPS check for a vendor. All of the Windows and Linux machines loose their durations.

Re: Duration resets for all Services with Apply Config

Posted: Wed Jul 08, 2015 2:32 pm
by tgriep
Could you post the service configurations for a system that is resetting the duration and one from a system that is not resetting the duration?

Re: Duration resets for all Services with Apply Config

Posted: Wed Jul 08, 2015 3:00 pm
by SavaSC
I couldn't get the files to attach, so I'm putting the data right in here. I've taken out most of the remarked lines.

LTC023N (Novell server)

Code: Select all

###############################################################################

define host {
	host_name			LTC023N
	use				HOU Hosts
	alias				Netware File Server
	address				10.90.18.123
	parents				HOU-RTR-INT
	hostgroups			Netware Servers
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	icon_image			novell.png
	statusmap_image			novell.png
	register			1
	}	
###############################################################################
LTC045M (Windows server)

Code: Select all

###############################################################################

define host {
	host_name			LTC043M
	use				HOU Hosts
	alias				CareTracker R2 Reporting Servers
	address				10.90.18.178
	hostgroups			CT_R2 Report Server
	icon_image			win_server.png
	statusmap_image			win_server.png
	_xiwizard			windowsserver
	register			1
	}	

###############################################################################
netware_Service.cfg (Novell Netware services)

Code: Select all

###############################################################################

define service {
	service_description		Check DATA Volume
	hostgroup_name			Netware Servers
	display_name			Check DATA Volume
	check_command			check_nw_disk!DATA!15!10!!!!!
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	active_checks_enabled		1
	check_period			24x7
	notification_interval		60
	notification_period		24x7
	notification_options		c,u,r,
	notifications_enabled		1
	contact_groups			Oncall
	register			1
	}	

define service {
	service_description		Check Netware Abends
	hostgroup_name			Netware Servers
	display_name			Check Netware Abends
	check_command			check_nw_abends!!!!!!!!
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	active_checks_enabled		1
	check_period			24x7
	notification_interval		60
	notification_period		24x7
	notification_options		c,u,r,
	notifications_enabled		1
	contact_groups			Oncall
	register			1
	}	

define service {
	service_description		Check Netware LDAP
	use				Sava Service Settings
	hostgroup_name			Netware Servers
	display_name			Check Netware Abends
	check_command			check_nw_ldap!!!!!!!!
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	active_checks_enabled		1
	check_period			24x7
	notification_interval		60
	notification_period		24x7
	notification_options		c,u,r,
	notifications_enabled		1
	contact_groups			Oncall
	register			1
	}	

define service {
	service_description		Check SYS Volume
	hostgroup_name			Netware Servers
	display_name			Check SYS Volume
	check_command			check_nw_disk!SYS!10!5!!!!!
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	active_checks_enabled		1
	check_period			24x7
	notification_interval		60
	notification_period		24x7
	notification_options		c,u,r,
	notifications_enabled		1
	contact_groups			Oncall
	register			1
	}	

define service {
	service_description		Check Time Sync
	hostgroup_name			Netware Servers
	display_name			Check Time Sync
	check_command			check_nw_timesync!!!!!!!!
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	active_checks_enabled		1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		24x7
	notification_options		c,u,r,
	notifications_enabled		1
	contact_groups			Oncall
	register			1
	}	

###############################################################################
Windows_services.cfg (Windows services)

Code: Select all

###############################################################################

define service {
	service_description		CPU Usage
	use				Sava Service Settings
	hostgroup_name			Windows All Servers
	check_command			check_xi_service_nsclient!!CPULOAD!-l 15,90,95!!!!!
	first_notification_delay	30
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		DHCP service
	use				Sava Service Settings
	hostgroup_name			Windows DHCP
	check_command			check_xi_service_nsclient!!SERVICESTATE!-l "DHCPServer" -d SHOWALL!!!!!
	contact_groups			Oncall
	register			1
	}	

define service {
	service_description		Drive C: Disk Usage
	use				Sava Service Settings
	hostgroup_name			Windows All Servers
	check_command			check_xi_service_nsclient!!USEDDISKSPACE!-l C -w 90 -c 95!!!!!
	max_check_attempts		25
	check_interval			30
	retry_interval			5
	check_freshness			null
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		Drive D: Disk Usage
	use				Sava Service Settings
	hostgroup_name			Windows Servers D:
	check_command			check_xi_service_nsclient!!USEDDISKSPACE!-l D -w 85 -c 90!!!!!
	max_check_attempts		25
	check_interval			30
	retry_interval			5
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		Drive E: Disk Usage
	use				Sava Service Settings
	hostgroup_name			Windows Servers E:
	check_command			check_xi_service_nsclient!!USEDDISKSPACE!-l E -w 90 -c 95!!!!!
	max_check_attempts		25
	check_interval			30
	retry_interval			5
	_xiwizard			windowsdesktop
	register			1
	}	

define service {
	service_description		Drive F: Disk Usage
	use				Sava Service Settings
	hostgroup_name			Windows Servers F:
	check_command			check_xi_service_nsclient!!USEDDISKSPACE!-l F -w 90 -c 95!!!!!
	max_check_attempts		25
	check_interval			30
	retry_interval			5
	contact_groups			SQL Admins
	register			1
	}	

define service {
	service_description		Drive I: Disk Usage
	use				Sava Service Settings
	hostgroup_name			Windows Servers I:
	check_command			check_xi_service_nsclient!!USEDDISKSPACE!-l I -w 80 -c 95!!!!!
	max_check_attempts		25
	check_interval			30
	retry_interval			5
	notification_interval		60
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		Drive T: Disk Usage
	use				Sava Service Settings
	hostgroup_name			Windows Servers T:
	check_command			check_xi_service_nsclient!!USEDDISKSPACE!-l T -w 80 -c 95!!!!!
	max_check_attempts		25
	check_interval			30
	retry_interval			5
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		Logon Errors
	use				Sava Service Settings
	hostgroup_name			Windows All Servers
	check_command			check_xi_service_nsclient!!COUNTER!-l "\\Server\\Errors System","Login Errors since last reboot is %.f" -w 2 -c 20!!!!!
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		Memory Usage
	use				Sava Service Settings
	hostgroup_name			Windows All Servers
	check_command			check_xi_service_nsclient!!MEMUSE!-w 85 -c 99!!!!!
	max_check_attempts		30
	retry_interval			5
	first_notification_delay	30
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		Page File Usage
	use				Sava Service Settings
	hostgroup_name			Windows All Servers
	check_command			check_xi_service_nsclient!!COUNTER!-l "\\Paging File(_Total)\\% Usage","Paging File usage is %.2f %%" -w 85 -c 95!!!!!
	first_notification_delay	30
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		Server Work Queues
	use				Sava Service Settings
	hostgroup_name			Windows All Servers
	check_command			check_xi_service_nsclient!!COUNTER!-l "\\Server Work Queues(0)\\Queue Length","Current work queue (an indication of processing load) is %.f " -w 4 -c 7!!!!!
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		Uptime
	use				Sava Service Settings
	hostgroup_name			Windows All Servers
	check_command			check_xi_service_nsclient!!UPTIME!!!!!!
	notification_options		c,
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		VNC Server
	use				Sava Service Settings
	hostgroup_name			ROX Process Schedulers
	display_name			VNC Service
	check_command			check_xi_service_nsclient!!SERVICESTATE!-l "VNC Server" -d SHOWALL!!!!!
	register			1
	}	

define service {
	service_description		VNC Server Port
	use				Sava Service Settings
	hostgroup_name			ROX Process Schedulers
	display_name			VNC Server Port
	check_command			check_tcp!5900!!!!!!!
	register			1
	}	

define service {
	service_description		Windows Deployment Service Check
	use				Sava Service Settings
	hostgroup_name			Deployment Servers
	check_command			check_xi_service_nsclient!!SERVICESTATE!-l WDSServer -d SHOWALL!!!!!
	contacts			+actalley,harrisr
	register			1
	}	

define service {
	service_description		Windows Update Service Check
	use				Sava Service Settings
	hostgroup_name			WSUS Servers
	check_command			check_xi_service_nsclient!!SERVICESTATE!-l WSUSService -d SHOWALL!!!!!
	register			1
	}	

###############################################################################

Re: Duration resets for all Services with Apply Config

Posted: Wed Jul 08, 2015 4:36 pm
by tgriep
Could you post the service template called "Sava Service Settings"?