Page 1 of 1

Problem "Services/host attended" in Nagios Core

Posted: Mon May 16, 2016 4:44 am
by redesgtt
Hello,

First of all, I am sorry for my english, is a little poor. I am goint to try explain my problem.

Since a time ago, the Duration field is reseted daily in all services or hosts attended. This coincides with the "scheduled restart" we do in Nagios service (/etc/init.d/nagios) all days at 06:00

Image

Our Nagios Core is: NagiosĀ® Coreā„¢ 4.0.5

It isn't normal. We have a lot services or host with "problems" since days ago, but how I have said, are resets daily.

All these problems show without attend, with the consequent problem of having to attend to daily.


I don't know where I should to look for fix this. I thank if someone could help.

Regards.

Re: Problem "Services/host attended" in Nagios Core

Posted: Mon May 16, 2016 3:12 pm
by tgriep
I just want to verify your issue is that your checks are not running and the durations are not getting updated, is that correct?

Lets try and restart nagios by running the following commands as root on the server.

Code: Select all

service nagios stop
killall -9 nagios
service nagios start
Wait a few minutes and do the checks start to update?

If not, run this command and post the output.

Code: Select all

 /usr/local/nagios/bin/nagiostats

Re: Problem "Services/host attended" in Nagios Core

Posted: Tue May 17, 2016 2:12 am
by redesgtt
tgriep wrote:I just want to verify your issue is that your checks are not running and the durations are not getting updated, is that correct?

Lets try and restart nagios by running the following commands as root on the server.

Code: Select all

service nagios stop
killall -9 nagios
service nagios start
Wait a few minutes and do the checks start to update?

If not, run this command and post the output.

Code: Select all

 /usr/local/nagios/bin/nagiostats
No, I am sorry, I think I explained it bad.

Checks runs. Daily at 6 am we have "Kill -9 $PID" and then "/usr/local/nagios/bin/nagios -d $main_config_file" and like for "xinet" service

The problem is when we restarted these services dialy all things via "Nagios Web" are cleaned. The hosts and services that last day were attended, after "restart" are without attend, the "accounting" of "Duration" field show time since last restart. This is wrong. We have a lot problems with hosts or services since a lot days or months, but dialy we have attend again because this problem.

For example, now are 09:05

the last restart was at 06:00 and all things show "Duration" 0d 03h 5m without attend, but yestarday before 06:00 we attended it.

Image

Re: Problem "Services/host attended" in Nagios Core

Posted: Tue May 17, 2016 1:18 pm
by tgriep
I just want to verify that you have a problem, you fix it and Nagios reports it as OK but when you restart the server, it reports it as bad.
Is that true?
If so, it could be that the retention.dat file is corrupted. You could delete this file and the default location for that file is as follows.
/usr/local/nagios/var/retention.dat

What you would have to do is stop the nagios process, delete the file and then start up the nagios process.
One thing to note, when this file is deleted, all of the notes, downtime schedules will be gone and that the server will start to check all of the hosts and services at this point.
Let us know if this is what you are looking for.

Re: Problem "Services/host attended" in Nagios Core

Posted: Wed May 18, 2016 5:13 am
by redesgtt
tgriep wrote:I just want to verify that you have a problem, you fix it and Nagios reports it as OK but when you restart the server, it reports it as bad.
Is that true?
If so, it could be that the retention.dat file is corrupted. You could delete this file and the default location for that file is as follows.
/usr/local/nagios/var/retention.dat

What you would have to do is stop the nagios process, delete the file and then start up the nagios process.
One thing to note, when this file is deleted, all of the notes, downtime schedules will be gone and that the server will start to check all of the hosts and services at this point.
Let us know if this is what you are looking for.
I did not want to say that.

For example. Today if I attend a problem of a host/service, this is shown correctly as attended. Tomorrow, at 06:00, after restarting Nagios host/service will not show as attended. Their fields will be reseted to zero, such as the "Duration" field.

This happens With all hosts and services attended the previous day.

Re: Problem "Services/host attended" in Nagios Core

Posted: Wed May 18, 2016 9:38 am
by tgriep
Could you post your nagios.cfg file and one of the host / service config files for one one of the hosts that you are having issues with?
Can you provide a screen capture or more information on the before and after something was attended to?

Re: Problem "Services/host attended" in Nagios Core

Posted: Fri May 20, 2016 1:38 am
by redesgtt
This is "nagios.cfg" file:


bash-3.2$ cat nagios.cfg
##############################################################################
#
# NAGIOS.CFG - Sample Main Config File for Nagios 3.2.3
#
# Read the documentation for more information on this configuration
# file. I've provided some comments here, but things may not be so
# clear without further explanation.
#
##############################################################################
log_file=/usr/local/nagios/ramdisk/nagios.log
log_rotation_method=d
log_archive_path=/usr/local/nagios/var/archives
cfg_file=/usr/local/nagios/etc/localhost.cfg
cfg_file=/usr/local/nagios/etc/check_commands.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/notify_commands.cfg
cfg_file=/usr/local/nagios/etc/host_templates.cfg
cfg_file=/usr/local/nagios/etc/hosts.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/service_templates.cfg
cfg_file=/usr/local/nagios/etc/servicegroups.cfg
cfg_file=/usr/local/nagios/etc/services.cfg
object_cache_file=/usr/local/nagios/ramdisk/objects.cache
precached_object_file=/usr/local/nagios/ramdisk/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
status_file=/usr/local/nagios/ramdisk/status.dat
status_update_interval=10
lock_file=/usr/local/nagios/var/nagios.lock
temp_file=/usr/local/nagios/ramdisk/nagios.tmp
temp_path=/tmp
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_file=/usr/local/nagios/var/rw/nagios.cmd
use_syslog=0
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_external_commands=1
log_passive_checks=1
service_inter_check_delay_method=s
max_service_check_spread=30
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=30
max_concurrent_checks=0
check_result_path=/usr/local/nagios/ramdisk/checkresults
max_check_result_file_age=3600
cached_host_check_horizon=15
cached_service_check_horizon=15
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
soft_state_dependencies=0
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
service_check_timeout=60
host_check_timeout=60
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=60
retain_state_information=1
state_retention_file=/usr/local/nagios/ramdisk/retention.dat
retention_update_interval=30
use_retained_program_state=1
use_retained_scheduling_info=1
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
interval_length=60
check_for_updates=0
bare_update_check=0
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_notifications=1
enable_event_handlers=1
process_performance_data=1
service_perfdata_file=/usr/local/nagios/ramdisk/perfdata.log
host_perfdata_file_template=$DATE$;$TIME$;$HOSTNAME$;$SERVICEDESC$;$SERVICEPERFDATA$
service_perfdata_file_template=$TIMET$;$HOSTNAME$;$SERVICEDESC$;$SERVICEPERFDATA$
host_perfdata_file_mode=a
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=60
service_perfdata_file_processing_command=perfdata_file_rrdtool
obsess_over_services=0
obsess_over_hosts=0
translate_passive_host_checks=0
passive_host_checks_are_soft=0
check_for_orphaned_services=1
check_for_orphaned_hosts=1
check_service_freshness=1
service_freshness_check_interval=60
check_host_freshness=1
host_freshness_check_interval=60
additional_freshness_latency=15
enable_flap_detection=0
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=euro

######################################################################
######################################################################


This is a services with "acknowledge problem" yesterday at 5:35pm
Image

and this is the same services today at 8:15am after restart Nagios service daily at 6:00am
Image

Re: Problem "Services/host attended" in Nagios Core

Posted: Fri May 20, 2016 2:11 am
by Box293
state_retention_file=/usr/local/nagios/ramdisk/retention.dat

I would move this file off the ramdisk. If the ramdisk is being re-created as part of the daily restart, this would result in losing all the information.

https://assets.nagios.com/downloads/nag ... ntion_file
This is the file that Nagios will use for storing status, downtime, and comment information before it shuts down. When Nagios is restarted it will use the information stored in this file for setting the initial states of services and hosts before it starts monitoring anything. In order to make Nagios retain state information between program restarts, you must enable the retain_state_information option.

Re: Problem "Services/host attended" in Nagios Core

Posted: Tue Sep 06, 2016 2:57 am
by redesgtt
I apologize for no respond before.

Finally we delete retention.dat in /usr/local/nagios/ramdisk/ and after we restart Nagios was created again but with less weight.

Because this was the problem: ramdisk partition was full. So in a future we delete retention.dat again or other option could be to increase ramdisk partion.

you can close this Post

thanks a lot for your advice :)

Re: Problem "Services/host attended" in Nagios Core

Posted: Tue Sep 06, 2016 11:28 am
by rkennedy
As a recommendation, I would increase the ram disk so that in the future it doesn't happen.

Going to close this thread out, feel free to create a new one if you have questions in the future!