Page 3 of 4
Re: Duration resets for all Services with Apply Config
Posted: Tue Jul 07, 2015 11:26 am
by SavaSC
Ha - sorry about that. I didn't bother looking at the date.
Yea, I didn't either until after the fact.
I have renamed both of those log files. I am assuming that Nagios will recreate them.
The Nagios.log file is only showing the service checks with almost nothing else. Nothing looks wrong or gives a failure or error.
Re: Duration resets for all Services with Apply Config
Posted: Tue Jul 07, 2015 11:33 am
by jdalrymple
grep for "Nagios <version> starting" and look in that general area for errors:
Code: Select all
[1436286677] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
[1436286678] Nagios 4.0.8 starting... (PID=41172)
[1436286678] Local time is Tue Jul 07 11:31:18 CDT 2015
[1436286678] LOG VERSION: 2.0
[1436286678] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1436286678] qh: core query handler registered
[1436286678] nerd: Channel hostchecks registered successfully
[1436286678] nerd: Channel servicechecks registered successfully
[1436286678] nerd: Channel opathchecks registered successfully
[1436286678] nerd: Fully initialized and ready to rock!
[1436286678] wproc: Successfully registered manager as @wproc with query handler
Also, take a look for any errors in /var/log/mysqld.log
Re: Duration resets for all Services with Apply Config
Posted: Tue Jul 07, 2015 2:42 pm
by SavaSC
Here is the last section on a reboot from the nagios.log. This is from shutdown till the service alerts start.
Code: Select all
[1436297400] Auto-save of retention data completed successfully.
[1436297523] Caught SIGTERM, shutting down...
[1436297523] Successfully shutdown... (PID=28873)
[1436297523] Event broker module 'NERD' deinitialized successfully.
[1436297523] ndomod: Shutdown complete.
[1436297523] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully.
[1436297524] Nagios 4.0.8 starting... (PID=13744)
[1436297524] Local time is Tue Jul 07 14:32:04 CDT 2015
[1436297524] LOG VERSION: 2.0
[1436297524] qh: Socket '/usr/local/nagios/var/rw/nagios.qh' successfully initialized
[1436297524] qh: core query handler registered
[1436297524] nerd: Channel hostchecks registered successfully
[1436297524] nerd: Channel servicechecks registered successfully
[1436297524] nerd: Channel opathchecks registered successfully
[1436297524] nerd: Fully initialized and ready to rock!
[1436297524] wproc: Successfully registered manager as @wproc with query handler
[1436297524] wproc: Registry request: name=Core Worker 13745;pid=13745
[1436297524] wproc: Registry request: name=Core Worker 13746;pid=13746
[1436297524] wproc: Registry request: name=Core Worker 13750;pid=13750
[1436297524] wproc: Registry request: name=Core Worker 13749;pid=13749
[1436297524] wproc: Registry request: name=Core Worker 13747;pid=13747
[1436297524] wproc: Registry request: name=Core Worker 13751;pid=13751
[1436297524] ndomod: NDOMOD 2.0.0 (02-28-2014) Copyright (c) 2009 Nagios Core Development Team and Community Contributors
[1436297524] ndomod: Successfully connected to data sink. 0 queued items to flush.
[1436297524] ndomod registered for process data
[1436297524] ndomod registered for log data'
[1436297524] ndomod registered for system command data'
[1436297524] ndomod registered for event handler data'
[1436297524] ndomod registered for notification data'
[1436297524] ndomod registered for comment data'
[1436297524] ndomod registered for downtime data'
[1436297524] ndomod registered for flapping data'
[1436297524] ndomod registered for program status data'
[1436297524] ndomod registered for host status data'
[1436297524] ndomod registered for service status data'
[1436297524] ndomod registered for adaptive program data'
[1436297524] ndomod registered for adaptive host data'
[1436297524] ndomod registered for adaptive service data'
[1436297524] ndomod registered for external command data'
[1436297524] ndomod registered for aggregated status data'
[1436297524] ndomod registered for retention data'
[1436297524] ndomod registered for contact data'
[1436297524] ndomod registered for contact notification data'
[1436297524] ndomod registered for acknowledgement data'
[1436297524] ndomod registered for state change data'
[1436297524] ndomod registered for contact status data'
[1436297524] ndomod registered for adaptive contact data'
[1436297524] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully.
[1436297524] Warning: Host 'www.google.com' has no default contacts or contactgroups defined!
[1436297524] Successfully launched command file worker with pid 13756
The /var/logs/mysqld.log file is quite large. I have renamed it in order to get Nagios to recreate it. I will wait until tomorrow and then look to see what errors have been thrown.
Re: Duration resets for all Services with Apply Config
Posted: Wed Jul 08, 2015 8:59 am
by jdalrymple
SavaSC wrote:The /var/logs/mysqld.log file is quite large.
Typically that's not a good thing - we'll wait to see whatcha got.
Re: Duration resets for all Services with Apply Config
Posted: Wed Jul 08, 2015 11:28 am
by SavaSC
The /var/log/mysqld.log did not recreate itself. Any suggestions?
Re: Duration resets for all Services with Apply Config
Posted: Wed Jul 08, 2015 11:57 am
by jolson
You may need to restart mysqld for that log file to be regenerated.
On another note, the only difference between your nagios.cfg and the default install is your defined flap thresholds.
Default install:
Code: Select all
low_host_flap_threshold=5.0
low_service_flap_threshold=5.0
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
Your install:
Code: Select all
low_host_flap_threshold=30.0
low_service_flap_threshold=30.0
high_host_flap_threshold=50.0
high_service_flap_threshold=50.0
I doubt that your flapping thresholds are responsible for this behavior, though if you wouldn't mind changing the values to default and restarting nagios I would be interested in your results.
Was your retention.dat file regenerated properly after it was deleted?
Code: Select all
ls -l /usr/local/nagios/var/retention.dat
If you stop the nagios process, does retention.dat persist? Retention.dat is responsible for keeping track of your service states, which affects your duration value.
Re: Duration resets for all Services with Apply Config
Posted: Wed Jul 08, 2015 2:25 pm
by SavaSC
I restarted the MySQL service and it did recreate the file. There's not really anything in it yet.
I have also changed the flap settings and restarted the Nagios service. Still doing the same thing.
The Rentention.dat file does persist. I deleted it and applied config. It came back. I stopped the Nagios service and it was still there.
Oddly, the only services that seem to not lose their timing are all the Novell servers and one external HTTPS check for a vendor. All of the Windows and Linux machines loose their durations.
Re: Duration resets for all Services with Apply Config
Posted: Wed Jul 08, 2015 2:32 pm
by tgriep
Could you post the service configurations for a system that is resetting the duration and one from a system that is not resetting the duration?
Re: Duration resets for all Services with Apply Config
Posted: Wed Jul 08, 2015 3:00 pm
by SavaSC
I couldn't get the files to attach, so I'm putting the data right in here. I've taken out most of the remarked lines.
LTC023N (Novell server)
Code: Select all
###############################################################################
define host {
host_name LTC023N
use HOU Hosts
alias Netware File Server
address 10.90.18.123
parents HOU-RTR-INT
hostgroups Netware Servers
max_check_attempts 5
check_interval 5
retry_interval 1
icon_image novell.png
statusmap_image novell.png
register 1
}
###############################################################################
LTC045M (Windows server)
Code: Select all
###############################################################################
define host {
host_name LTC043M
use HOU Hosts
alias CareTracker R2 Reporting Servers
address 10.90.18.178
hostgroups CT_R2 Report Server
icon_image win_server.png
statusmap_image win_server.png
_xiwizard windowsserver
register 1
}
###############################################################################
netware_Service.cfg (Novell Netware services)
Code: Select all
###############################################################################
define service {
service_description Check DATA Volume
hostgroup_name Netware Servers
display_name Check DATA Volume
check_command check_nw_disk!DATA!15!10!!!!!
max_check_attempts 5
check_interval 5
retry_interval 1
active_checks_enabled 1
check_period 24x7
notification_interval 60
notification_period 24x7
notification_options c,u,r,
notifications_enabled 1
contact_groups Oncall
register 1
}
define service {
service_description Check Netware Abends
hostgroup_name Netware Servers
display_name Check Netware Abends
check_command check_nw_abends!!!!!!!!
max_check_attempts 5
check_interval 5
retry_interval 1
active_checks_enabled 1
check_period 24x7
notification_interval 60
notification_period 24x7
notification_options c,u,r,
notifications_enabled 1
contact_groups Oncall
register 1
}
define service {
service_description Check Netware LDAP
use Sava Service Settings
hostgroup_name Netware Servers
display_name Check Netware Abends
check_command check_nw_ldap!!!!!!!!
max_check_attempts 5
check_interval 5
retry_interval 1
active_checks_enabled 1
check_period 24x7
notification_interval 60
notification_period 24x7
notification_options c,u,r,
notifications_enabled 1
contact_groups Oncall
register 1
}
define service {
service_description Check SYS Volume
hostgroup_name Netware Servers
display_name Check SYS Volume
check_command check_nw_disk!SYS!10!5!!!!!
max_check_attempts 5
check_interval 5
retry_interval 1
active_checks_enabled 1
check_period 24x7
notification_interval 60
notification_period 24x7
notification_options c,u,r,
notifications_enabled 1
contact_groups Oncall
register 1
}
define service {
service_description Check Time Sync
hostgroup_name Netware Servers
display_name Check Time Sync
check_command check_nw_timesync!!!!!!!!
max_check_attempts 5
check_interval 5
retry_interval 1
active_checks_enabled 1
check_period xi_timeperiod_24x7
notification_interval 60
notification_period 24x7
notification_options c,u,r,
notifications_enabled 1
contact_groups Oncall
register 1
}
###############################################################################
Windows_services.cfg (Windows services)
Code: Select all
###############################################################################
define service {
service_description CPU Usage
use Sava Service Settings
hostgroup_name Windows All Servers
check_command check_xi_service_nsclient!!CPULOAD!-l 15,90,95!!!!!
first_notification_delay 30
_xiwizard windowsserver
register 1
}
define service {
service_description DHCP service
use Sava Service Settings
hostgroup_name Windows DHCP
check_command check_xi_service_nsclient!!SERVICESTATE!-l "DHCPServer" -d SHOWALL!!!!!
contact_groups Oncall
register 1
}
define service {
service_description Drive C: Disk Usage
use Sava Service Settings
hostgroup_name Windows All Servers
check_command check_xi_service_nsclient!!USEDDISKSPACE!-l C -w 90 -c 95!!!!!
max_check_attempts 25
check_interval 30
retry_interval 5
check_freshness null
_xiwizard windowsserver
register 1
}
define service {
service_description Drive D: Disk Usage
use Sava Service Settings
hostgroup_name Windows Servers D:
check_command check_xi_service_nsclient!!USEDDISKSPACE!-l D -w 85 -c 90!!!!!
max_check_attempts 25
check_interval 30
retry_interval 5
_xiwizard windowsserver
register 1
}
define service {
service_description Drive E: Disk Usage
use Sava Service Settings
hostgroup_name Windows Servers E:
check_command check_xi_service_nsclient!!USEDDISKSPACE!-l E -w 90 -c 95!!!!!
max_check_attempts 25
check_interval 30
retry_interval 5
_xiwizard windowsdesktop
register 1
}
define service {
service_description Drive F: Disk Usage
use Sava Service Settings
hostgroup_name Windows Servers F:
check_command check_xi_service_nsclient!!USEDDISKSPACE!-l F -w 90 -c 95!!!!!
max_check_attempts 25
check_interval 30
retry_interval 5
contact_groups SQL Admins
register 1
}
define service {
service_description Drive I: Disk Usage
use Sava Service Settings
hostgroup_name Windows Servers I:
check_command check_xi_service_nsclient!!USEDDISKSPACE!-l I -w 80 -c 95!!!!!
max_check_attempts 25
check_interval 30
retry_interval 5
notification_interval 60
_xiwizard windowsserver
register 1
}
define service {
service_description Drive T: Disk Usage
use Sava Service Settings
hostgroup_name Windows Servers T:
check_command check_xi_service_nsclient!!USEDDISKSPACE!-l T -w 80 -c 95!!!!!
max_check_attempts 25
check_interval 30
retry_interval 5
_xiwizard windowsserver
register 1
}
define service {
service_description Logon Errors
use Sava Service Settings
hostgroup_name Windows All Servers
check_command check_xi_service_nsclient!!COUNTER!-l "\\Server\\Errors System","Login Errors since last reboot is %.f" -w 2 -c 20!!!!!
_xiwizard windowsserver
register 1
}
define service {
service_description Memory Usage
use Sava Service Settings
hostgroup_name Windows All Servers
check_command check_xi_service_nsclient!!MEMUSE!-w 85 -c 99!!!!!
max_check_attempts 30
retry_interval 5
first_notification_delay 30
_xiwizard windowsserver
register 1
}
define service {
service_description Page File Usage
use Sava Service Settings
hostgroup_name Windows All Servers
check_command check_xi_service_nsclient!!COUNTER!-l "\\Paging File(_Total)\\% Usage","Paging File usage is %.2f %%" -w 85 -c 95!!!!!
first_notification_delay 30
_xiwizard windowsserver
register 1
}
define service {
service_description Server Work Queues
use Sava Service Settings
hostgroup_name Windows All Servers
check_command check_xi_service_nsclient!!COUNTER!-l "\\Server Work Queues(0)\\Queue Length","Current work queue (an indication of processing load) is %.f " -w 4 -c 7!!!!!
_xiwizard windowsserver
register 1
}
define service {
service_description Uptime
use Sava Service Settings
hostgroup_name Windows All Servers
check_command check_xi_service_nsclient!!UPTIME!!!!!!
notification_options c,
_xiwizard windowsserver
register 1
}
define service {
service_description VNC Server
use Sava Service Settings
hostgroup_name ROX Process Schedulers
display_name VNC Service
check_command check_xi_service_nsclient!!SERVICESTATE!-l "VNC Server" -d SHOWALL!!!!!
register 1
}
define service {
service_description VNC Server Port
use Sava Service Settings
hostgroup_name ROX Process Schedulers
display_name VNC Server Port
check_command check_tcp!5900!!!!!!!
register 1
}
define service {
service_description Windows Deployment Service Check
use Sava Service Settings
hostgroup_name Deployment Servers
check_command check_xi_service_nsclient!!SERVICESTATE!-l WDSServer -d SHOWALL!!!!!
contacts +actalley,harrisr
register 1
}
define service {
service_description Windows Update Service Check
use Sava Service Settings
hostgroup_name WSUS Servers
check_command check_xi_service_nsclient!!SERVICESTATE!-l WSUSService -d SHOWALL!!!!!
register 1
}
###############################################################################
Re: Duration resets for all Services with Apply Config
Posted: Wed Jul 08, 2015 4:36 pm
by tgriep
Could you post the service template called "Sava Service Settings"?