Page 1 of 2
Duration
Posted: Sat Sep 07, 2019 2:34 pm
by orani
How can i maintain the duration after a reboot of nagios server.
Example: I have a host which is down for 3d4h32m. If i restart the nagios server, after reboot this host will still be down but in duration i will see 0d0h1m etc.
Re: Duration
Posted: Mon Sep 09, 2019 8:01 am
by mcapra
Check the following configuration directives in your main Nagios Core configuration file to make sure they reflect your desired behavior:
https://assets.nagios.com/downloads/nag ... nformation
Code: Select all
retain_state_information
state_retention_file
retention_update_interval
If your
state_retention_file is held on a ramdisk, or in some other ephemeral storage (like
/tmp on some Linux flavors), it is likely being removed on system restarts regardless of anything Nagios Core attempts.
Re: Duration
Posted: Mon Sep 09, 2019 8:26 am
by orani
Code: Select all
retain_state_information =1
state_retention_file=/usr/local/nagios/var/retention.dat
retention_update_interval=60
Those are my settings. I think that with those settings nagios should keep duration after restart but it does not.
Re: Duration
Posted: Mon Sep 09, 2019 9:06 am
by scottwilkerson
orani wrote:How can i maintain the duration after a reboot of nagios server.
Example: I have a host which is down for 3d4h32m. If i restart the nagios server, after reboot this host will still be down but in duration i will see 0d0h1m etc.
Can you show the check command that is producing this for this host?
Re: Duration
Posted: Mon Sep 09, 2019 9:12 am
by orani
This is not happening at a single check but at all checks.
Those are some of my definitions
Code: Select all
define host{
host_name Infra1
address 10.0.6.11
use host-pnp
contacts nagiosadmin,systemadmin
check_command check-host-alive
max_check_attempts 5
check_interval 1
retry_interval 1
parents SMCSW
check_period 24x7
hostgroups KDS PHYSICAL SERVERS
statusmap_image rack-server.gd2
notification_interval 0
notification_options d,u,r,s
notification_period 24x7
}
define service{
host_name Infra1
use service-pnp
check_command check_nt!CPULOAD! -l 5,80,90 -s nagios
contacts nagiosadmin,systemadmin
max_check_attempts 5
service_description CPU Load
check_interval 1
retry_interval 1
check_period 24x7
servicegroups CPU Load
notification_interval 0
notification_options w,u,c,r,s
}
Re: Duration
Posted: Mon Sep 09, 2019 9:18 am
by scottwilkerson
Can you show the output of the following
Code: Select all
grep reta /usr/local/nagios/etc/nagios.cfg
Re: Duration
Posted: Mon Sep 09, 2019 10:14 am
by orani
[root@nagios ~]# grep reta /usr/local/nagios/etc/nagios.cfg
retain_state_information=1
# This file is used only if the retain_state_information
# retention file. If you want to use retained program status
use_retained_program_state=1
# This setting determines whether or not Nagios will retain
# If you want to use retained scheduling info, set this
use_retained_scheduling_info=1
# service attributes that should *not* be retained by Nagios during
# of flap detection and event handlers for hosts to be retained, you
# This mask determines what host attributes are not retained
retained_host_attribute_mask=0
# This mask determines what service attributes are not retained
retained_service_attribute_mask=0
# These two masks determine what process attributes are not retained.
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
# These two masks determine what contact attributes are not retained.
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
Re: Duration
Posted: Mon Sep 09, 2019 10:31 am
by scottwilkerson
That all looks correct, let's look at permissions on the retention.dat
Code: Select all
ls -al /usr/local/nagios/var/retention.dat
Re: Duration
Posted: Mon Sep 09, 2019 11:50 am
by orani
Code: Select all
[root@nagios ~]# ls -al /usr/local/nagios/var/retention.dat
-rw------- 1 nagios nagios 2857104 Sep 9 19:06 /usr/local/nagios/var/retention.dat
Re: Duration
Posted: Mon Sep 09, 2019 12:41 pm
by scottwilkerson
Does it reset the duration if you just restart the nagios service?
All looks proper so far, the only thing that I don't know if if for some reason on rebooting the Nagios server you are losing the
/usr/local/nagios/var/retention.dat file