Duration resets for all Services with Apply Config

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
SavaSC
Posts: 238
Joined: Wed Feb 23, 2011 4:49 pm

Re: Duration resets for all Services with Apply Config

Post by SavaSC »

Ha - sorry about that. I didn't bother looking at the date.
Yea, I didn't either until after the fact. :D

I have renamed both of those log files. I am assuming that Nagios will recreate them.

The Nagios.log file is only showing the service checks with almost nothing else. Nothing looks wrong or gives a failure or error.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Duration resets for all Services with Apply Config

Post by jdalrymple »

grep for "Nagios <version> starting" and look in that general area for errors:

Code: Select all

[1436286677] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
[1436286678] Nagios 4.0.8 starting... (PID=41172)
[1436286678] Local time is Tue Jul 07 11:31:18 CDT 2015
[1436286678] LOG VERSION: 2.0
[1436286678] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1436286678] qh: core query handler registered
[1436286678] nerd: Channel hostchecks registered successfully
[1436286678] nerd: Channel servicechecks registered successfully
[1436286678] nerd: Channel opathchecks registered successfully
[1436286678] nerd: Fully initialized and ready to rock!
[1436286678] wproc: Successfully registered manager as @wproc with query handler
Also, take a look for any errors in /var/log/mysqld.log
SavaSC
Posts: 238
Joined: Wed Feb 23, 2011 4:49 pm

Re: Duration resets for all Services with Apply Config

Post by SavaSC »

Here is the last section on a reboot from the nagios.log. This is from shutdown till the service alerts start.

Code: Select all

[1436297400] Auto-save of retention data completed successfully.
[1436297523] Caught SIGTERM, shutting down...
[1436297523] Successfully shutdown... (PID=28873)
[1436297523] Event broker module 'NERD' deinitialized successfully.
[1436297523] ndomod: Shutdown complete.
[1436297523] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
[1436297524] Nagios 4.0.8 starting... (PID=13744)
[1436297524] Local time is Tue Jul 07 14:32:04 CDT 2015
[1436297524] LOG VERSION: 2.0
[1436297524] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1436297524] qh: core query handler registered
[1436297524] nerd: Channel hostchecks registered successfully
[1436297524] nerd: Channel servicechecks registered successfully
[1436297524] nerd: Channel opathchecks registered successfully
[1436297524] nerd: Fully initialized and ready to rock!
[1436297524] wproc: Successfully registered manager as @wproc with query handler
[1436297524] wproc: Registry request: name=Core Worker 13745;pid=13745
[1436297524] wproc: Registry request: name=Core Worker 13746;pid=13746
[1436297524] wproc: Registry request: name=Core Worker 13750;pid=13750
[1436297524] wproc: Registry request: name=Core Worker 13749;pid=13749
[1436297524] wproc: Registry request: name=Core Worker 13747;pid=13747
[1436297524] wproc: Registry request: name=Core Worker 13751;pid=13751
[1436297524] ndomod: NDOMOD 2.0.0 (02-28-2014) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1436297524] ndomod: Successfully connected to data sink.  0 queued items to flush.
[1436297524] ndomod registered for process data
[1436297524] ndomod registered for log data'
[1436297524] ndomod registered for system command data'
[1436297524] ndomod registered for event handler data'
[1436297524] ndomod registered for notification data'
[1436297524] ndomod registered for comment data'
[1436297524] ndomod registered for downtime data'
[1436297524] ndomod registered for flapping data'
[1436297524] ndomod registered for program status data'
[1436297524] ndomod registered for host status data'
[1436297524] ndomod registered for service status data'
[1436297524] ndomod registered for adaptive program data'
[1436297524] ndomod registered for adaptive host data'
[1436297524] ndomod registered for adaptive service data'
[1436297524] ndomod registered for external command data'
[1436297524] ndomod registered for aggregated status data'
[1436297524] ndomod registered for retention data'
[1436297524] ndomod registered for contact data'
[1436297524] ndomod registered for contact notification data'
[1436297524] ndomod registered for acknowledgement data'
[1436297524] ndomod registered for state change data'
[1436297524] ndomod registered for contact status data'
[1436297524] ndomod registered for adaptive contact data'
[1436297524] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1436297524] Warning: Host 'www.google.com' has no default contacts or contactgroups defined!
[1436297524] Successfully launched command file worker with pid 13756
The /var/logs/mysqld.log file is quite large. I have renamed it in order to get Nagios to recreate it. I will wait until tomorrow and then look to see what errors have been thrown.
jdalrymple
Skynet Drone
Posts: 2620
Joined: Wed Feb 11, 2015 1:56 pm

Re: Duration resets for all Services with Apply Config

Post by jdalrymple »

SavaSC wrote:The /var/logs/mysqld.log file is quite large.
Typically that's not a good thing - we'll wait to see whatcha got.
SavaSC
Posts: 238
Joined: Wed Feb 23, 2011 4:49 pm

Re: Duration resets for all Services with Apply Config

Post by SavaSC »

The /var/log/mysqld.log did not recreate itself. Any suggestions?
jolson
Attack Rabbit
Posts: 2560
Joined: Thu Feb 12, 2015 12:40 pm

Re: Duration resets for all Services with Apply Config

Post by jolson »

You may need to restart mysqld for that log file to be regenerated.

Code: Select all

service mysqld restart
On another note, the only difference between your nagios.cfg and the default install is your defined flap thresholds.

Default install:

Code: Select all

low_host_flap_threshold=5.0
low_service_flap_threshold=5.0
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
Your install:

Code: Select all

low_host_flap_threshold=30.0
low_service_flap_threshold=30.0
high_host_flap_threshold=50.0
high_service_flap_threshold=50.0
I doubt that your flapping thresholds are responsible for this behavior, though if you wouldn't mind changing the values to default and restarting nagios I would be interested in your results.

Was your retention.dat file regenerated properly after it was deleted?

Code: Select all

ls -l /usr/local/nagios/var/retention.dat
If you stop the nagios process, does retention.dat persist? Retention.dat is responsible for keeping track of your service states, which affects your duration value.
Twits Blog
Show me a man who lives alone and has a perpetually clean kitchen, and 8 times out of 9 I'll show you a man with detestable spiritual qualities.
SavaSC
Posts: 238
Joined: Wed Feb 23, 2011 4:49 pm

Re: Duration resets for all Services with Apply Config

Post by SavaSC »

I restarted the MySQL service and it did recreate the file. There's not really anything in it yet.
I have also changed the flap settings and restarted the Nagios service. Still doing the same thing.
The Rentention.dat file does persist. I deleted it and applied config. It came back. I stopped the Nagios service and it was still there.

Oddly, the only services that seem to not lose their timing are all the Novell servers and one external HTTPS check for a vendor. All of the Windows and Linux machines loose their durations.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Duration resets for all Services with Apply Config

Post by tgriep »

Could you post the service configurations for a system that is resetting the duration and one from a system that is not resetting the duration?
Be sure to check out our Knowledgebase for helpful articles and solutions!
SavaSC
Posts: 238
Joined: Wed Feb 23, 2011 4:49 pm

Re: Duration resets for all Services with Apply Config

Post by SavaSC »

I couldn't get the files to attach, so I'm putting the data right in here. I've taken out most of the remarked lines.

LTC023N (Novell server)

Code: Select all

###############################################################################

define host {
	host_name			LTC023N
	use				HOU Hosts
	alias				Netware File Server
	address				10.90.18.123
	parents				HOU-RTR-INT
	hostgroups			Netware Servers
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	icon_image			novell.png
	statusmap_image			novell.png
	register			1
	}	
###############################################################################
LTC045M (Windows server)

Code: Select all

###############################################################################

define host {
	host_name			LTC043M
	use				HOU Hosts
	alias				CareTracker R2 Reporting Servers
	address				10.90.18.178
	hostgroups			CT_R2 Report Server
	icon_image			win_server.png
	statusmap_image			win_server.png
	_xiwizard			windowsserver
	register			1
	}	

###############################################################################
netware_Service.cfg (Novell Netware services)

Code: Select all

###############################################################################

define service {
	service_description		Check DATA Volume
	hostgroup_name			Netware Servers
	display_name			Check DATA Volume
	check_command			check_nw_disk!DATA!15!10!!!!!
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	active_checks_enabled		1
	check_period			24x7
	notification_interval		60
	notification_period		24x7
	notification_options		c,u,r,
	notifications_enabled		1
	contact_groups			Oncall
	register			1
	}	

define service {
	service_description		Check Netware Abends
	hostgroup_name			Netware Servers
	display_name			Check Netware Abends
	check_command			check_nw_abends!!!!!!!!
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	active_checks_enabled		1
	check_period			24x7
	notification_interval		60
	notification_period		24x7
	notification_options		c,u,r,
	notifications_enabled		1
	contact_groups			Oncall
	register			1
	}	

define service {
	service_description		Check Netware LDAP
	use				Sava Service Settings
	hostgroup_name			Netware Servers
	display_name			Check Netware Abends
	check_command			check_nw_ldap!!!!!!!!
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	active_checks_enabled		1
	check_period			24x7
	notification_interval		60
	notification_period		24x7
	notification_options		c,u,r,
	notifications_enabled		1
	contact_groups			Oncall
	register			1
	}	

define service {
	service_description		Check SYS Volume
	hostgroup_name			Netware Servers
	display_name			Check SYS Volume
	check_command			check_nw_disk!SYS!10!5!!!!!
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	active_checks_enabled		1
	check_period			24x7
	notification_interval		60
	notification_period		24x7
	notification_options		c,u,r,
	notifications_enabled		1
	contact_groups			Oncall
	register			1
	}	

define service {
	service_description		Check Time Sync
	hostgroup_name			Netware Servers
	display_name			Check Time Sync
	check_command			check_nw_timesync!!!!!!!!
	max_check_attempts		5
	check_interval			5
	retry_interval			1
	active_checks_enabled		1
	check_period			xi_timeperiod_24x7
	notification_interval		60
	notification_period		24x7
	notification_options		c,u,r,
	notifications_enabled		1
	contact_groups			Oncall
	register			1
	}	

###############################################################################
Windows_services.cfg (Windows services)

Code: Select all

###############################################################################

define service {
	service_description		CPU Usage
	use				Sava Service Settings
	hostgroup_name			Windows All Servers
	check_command			check_xi_service_nsclient!!CPULOAD!-l 15,90,95!!!!!
	first_notification_delay	30
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		DHCP service
	use				Sava Service Settings
	hostgroup_name			Windows DHCP
	check_command			check_xi_service_nsclient!!SERVICESTATE!-l "DHCPServer" -d SHOWALL!!!!!
	contact_groups			Oncall
	register			1
	}	

define service {
	service_description		Drive C: Disk Usage
	use				Sava Service Settings
	hostgroup_name			Windows All Servers
	check_command			check_xi_service_nsclient!!USEDDISKSPACE!-l C -w 90 -c 95!!!!!
	max_check_attempts		25
	check_interval			30
	retry_interval			5
	check_freshness			null
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		Drive D: Disk Usage
	use				Sava Service Settings
	hostgroup_name			Windows Servers D:
	check_command			check_xi_service_nsclient!!USEDDISKSPACE!-l D -w 85 -c 90!!!!!
	max_check_attempts		25
	check_interval			30
	retry_interval			5
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		Drive E: Disk Usage
	use				Sava Service Settings
	hostgroup_name			Windows Servers E:
	check_command			check_xi_service_nsclient!!USEDDISKSPACE!-l E -w 90 -c 95!!!!!
	max_check_attempts		25
	check_interval			30
	retry_interval			5
	_xiwizard			windowsdesktop
	register			1
	}	

define service {
	service_description		Drive F: Disk Usage
	use				Sava Service Settings
	hostgroup_name			Windows Servers F:
	check_command			check_xi_service_nsclient!!USEDDISKSPACE!-l F -w 90 -c 95!!!!!
	max_check_attempts		25
	check_interval			30
	retry_interval			5
	contact_groups			SQL Admins
	register			1
	}	

define service {
	service_description		Drive I: Disk Usage
	use				Sava Service Settings
	hostgroup_name			Windows Servers I:
	check_command			check_xi_service_nsclient!!USEDDISKSPACE!-l I -w 80 -c 95!!!!!
	max_check_attempts		25
	check_interval			30
	retry_interval			5
	notification_interval		60
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		Drive T: Disk Usage
	use				Sava Service Settings
	hostgroup_name			Windows Servers T:
	check_command			check_xi_service_nsclient!!USEDDISKSPACE!-l T -w 80 -c 95!!!!!
	max_check_attempts		25
	check_interval			30
	retry_interval			5
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		Logon Errors
	use				Sava Service Settings
	hostgroup_name			Windows All Servers
	check_command			check_xi_service_nsclient!!COUNTER!-l "\\Server\\Errors System","Login Errors since last reboot is %.f" -w 2 -c 20!!!!!
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		Memory Usage
	use				Sava Service Settings
	hostgroup_name			Windows All Servers
	check_command			check_xi_service_nsclient!!MEMUSE!-w 85 -c 99!!!!!
	max_check_attempts		30
	retry_interval			5
	first_notification_delay	30
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		Page File Usage
	use				Sava Service Settings
	hostgroup_name			Windows All Servers
	check_command			check_xi_service_nsclient!!COUNTER!-l "\\Paging File(_Total)\\% Usage","Paging File usage is %.2f %%" -w 85 -c 95!!!!!
	first_notification_delay	30
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		Server Work Queues
	use				Sava Service Settings
	hostgroup_name			Windows All Servers
	check_command			check_xi_service_nsclient!!COUNTER!-l "\\Server Work Queues(0)\\Queue Length","Current work queue (an indication of processing load) is %.f " -w 4 -c 7!!!!!
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		Uptime
	use				Sava Service Settings
	hostgroup_name			Windows All Servers
	check_command			check_xi_service_nsclient!!UPTIME!!!!!!
	notification_options		c,
	_xiwizard			windowsserver
	register			1
	}	

define service {
	service_description		VNC Server
	use				Sava Service Settings
	hostgroup_name			ROX Process Schedulers
	display_name			VNC Service
	check_command			check_xi_service_nsclient!!SERVICESTATE!-l "VNC Server" -d SHOWALL!!!!!
	register			1
	}	

define service {
	service_description		VNC Server Port
	use				Sava Service Settings
	hostgroup_name			ROX Process Schedulers
	display_name			VNC Server Port
	check_command			check_tcp!5900!!!!!!!
	register			1
	}	

define service {
	service_description		Windows Deployment Service Check
	use				Sava Service Settings
	hostgroup_name			Deployment Servers
	check_command			check_xi_service_nsclient!!SERVICESTATE!-l WDSServer -d SHOWALL!!!!!
	contacts			+actalley,harrisr
	register			1
	}	

define service {
	service_description		Windows Update Service Check
	use				Sava Service Settings
	hostgroup_name			WSUS Servers
	check_command			check_xi_service_nsclient!!SERVICESTATE!-l WSUSService -d SHOWALL!!!!!
	register			1
	}	

###############################################################################
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Duration resets for all Services with Apply Config

Post by tgriep »

Could you post the service template called "Sava Service Settings"?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Locked