Page 1 of 2

Duration

PostPosted: Sat Sep 07, 2019 2:34 pm
by orani
How can i maintain the duration after a reboot of nagios server.

Example: I have a host which is down for 3d4h32m. If i restart the nagios server, after reboot this host will still be down but in duration i will see 0d0h1m etc.

Re: Duration

PostPosted: Mon Sep 09, 2019 8:01 am
by mcapra
Check the following configuration directives in your main Nagios Core configuration file to make sure they reflect your desired behavior:
https://assets.nagios.com/downloads/nag ... nformation

Code: Select all
retain_state_information
state_retention_file
retention_update_interval


If your state_retention_file is held on a ramdisk, or in some other ephemeral storage (like /tmp on some Linux flavors), it is likely being removed on system restarts regardless of anything Nagios Core attempts.

Re: Duration

PostPosted: Mon Sep 09, 2019 8:26 am
by orani
Code: Select all
retain_state_information =1
state_retention_file=/usr/local/nagios/var/retention.dat
retention_update_interval=60


Those are my settings. I think that with those settings nagios should keep duration after restart but it does not.

Re: Duration

PostPosted: Mon Sep 09, 2019 9:06 am
by scottwilkerson
orani wrote:How can i maintain the duration after a reboot of nagios server.

Example: I have a host which is down for 3d4h32m. If i restart the nagios server, after reboot this host will still be down but in duration i will see 0d0h1m etc.


Can you show the check command that is producing this for this host?

Re: Duration

PostPosted: Mon Sep 09, 2019 9:12 am
by orani
This is not happening at a single check but at all checks.

Those are some of my definitions
Code: Select all
define host{
        host_name       Infra1
        address         10.0.6.11
        use host-pnp
        contacts        nagiosadmin,systemadmin
        check_command check-host-alive
        max_check_attempts      5
        check_interval          1
        retry_interval          1
        parents SMCSW
        check_period            24x7
        hostgroups KDS PHYSICAL SERVERS
        statusmap_image rack-server.gd2
        notification_interval   0
        notification_options    d,u,r,s
        notification_period     24x7
}



define service{
        host_name       Infra1
        use service-pnp
        check_command   check_nt!CPULOAD! -l 5,80,90 -s nagios
        contacts        nagiosadmin,systemadmin
        max_check_attempts      5
        service_description     CPU Load
        check_interval          1
        retry_interval          1
        check_period            24x7
        servicegroups           CPU Load
        notification_interval   0
        notification_options    w,u,c,r,s
}

Re: Duration

PostPosted: Mon Sep 09, 2019 9:18 am
by scottwilkerson
Can you show the output of the following
Code: Select all
grep reta /usr/local/nagios/etc/nagios.cfg

Re: Duration

PostPosted: Mon Sep 09, 2019 10:14 am
by orani
[root@nagios ~]# grep reta /usr/local/nagios/etc/nagios.cfg
retain_state_information=1
# This file is used only if the retain_state_information
# retention file. If you want to use retained program status
use_retained_program_state=1
# This setting determines whether or not Nagios will retain
# If you want to use retained scheduling info, set this
use_retained_scheduling_info=1
# service attributes that should *not* be retained by Nagios during
# of flap detection and event handlers for hosts to be retained, you
# This mask determines what host attributes are not retained
retained_host_attribute_mask=0
# This mask determines what service attributes are not retained
retained_service_attribute_mask=0
# These two masks determine what process attributes are not retained.
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
# These two masks determine what contact attributes are not retained.
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0

Re: Duration

PostPosted: Mon Sep 09, 2019 10:31 am
by scottwilkerson
That all looks correct, let's look at permissions on the retention.dat

Code: Select all
ls -al /usr/local/nagios/var/retention.dat

Re: Duration

PostPosted: Mon Sep 09, 2019 11:50 am
by orani
Code: Select all
[root@nagios ~]# ls -al /usr/local/nagios/var/retention.dat
-rw------- 1 nagios nagios 2857104 Sep  9 19:06 /usr/local/nagios/var/retention.dat

Re: Duration

PostPosted: Mon Sep 09, 2019 12:41 pm
by scottwilkerson
Does it reset the duration if you just restart the nagios service?

Code: Select all
service nagios restart


All looks proper so far, the only thing that I don't know if if for some reason on rebooting the Nagios server you are losing the /usr/local/nagios/var/retention.dat file