Problem "Services/host attended" in Nagios Core

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
redesgtt
Posts: 36
Joined: Mon May 16, 2016 2:57 am

Problem "Services/host attended" in Nagios Core

Post by redesgtt »

Hello,

First of all, I am sorry for my english, is a little poor. I am goint to try explain my problem.

Since a time ago, the Duration field is reseted daily in all services or hosts attended. This coincides with the "scheduled restart" we do in Nagios service (/etc/init.d/nagios) all days at 06:00

Image

Our Nagios Core is: Nagios® Core™ 4.0.5

It isn't normal. We have a lot services or host with "problems" since days ago, but how I have said, are resets daily.

All these problems show without attend, with the consequent problem of having to attend to daily.


I don't know where I should to look for fix this. I thank if someone could help.

Regards.
Attachments
duration_field.jpg
duration_field.jpg (8.79 KiB) Viewed 2737 times
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Problem "Services/host attended" in Nagios Core

Post by tgriep »

I just want to verify your issue is that your checks are not running and the durations are not getting updated, is that correct?

Lets try and restart nagios by running the following commands as root on the server.

Code: Select all

service nagios stop
killall -9 nagios
service nagios start
Wait a few minutes and do the checks start to update?

If not, run this command and post the output.

Code: Select all

 /usr/local/nagios/bin/nagiostats
Be sure to check out our Knowledgebase for helpful articles and solutions!
redesgtt
Posts: 36
Joined: Mon May 16, 2016 2:57 am

Re: Problem "Services/host attended" in Nagios Core

Post by redesgtt »

tgriep wrote:I just want to verify your issue is that your checks are not running and the durations are not getting updated, is that correct?

Lets try and restart nagios by running the following commands as root on the server.

Code: Select all

service nagios stop
killall -9 nagios
service nagios start
Wait a few minutes and do the checks start to update?

If not, run this command and post the output.

Code: Select all

 /usr/local/nagios/bin/nagiostats
No, I am sorry, I think I explained it bad.

Checks runs. Daily at 6 am we have "Kill -9 $PID" and then "/usr/local/nagios/bin/nagios -d $main_config_file" and like for "xinet" service

The problem is when we restarted these services dialy all things via "Nagios Web" are cleaned. The hosts and services that last day were attended, after "restart" are without attend, the "accounting" of "Duration" field show time since last restart. This is wrong. We have a lot problems with hosts or services since a lot days or months, but dialy we have attend again because this problem.

For example, now are 09:05

the last restart was at 06:00 and all things show "Duration" 0d 03h 5m without attend, but yestarday before 06:00 we attended it.

Image
Attachments
nagios1.jpg
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Problem "Services/host attended" in Nagios Core

Post by tgriep »

I just want to verify that you have a problem, you fix it and Nagios reports it as OK but when you restart the server, it reports it as bad.
Is that true?
If so, it could be that the retention.dat file is corrupted. You could delete this file and the default location for that file is as follows.
/usr/local/nagios/var/retention.dat

What you would have to do is stop the nagios process, delete the file and then start up the nagios process.
One thing to note, when this file is deleted, all of the notes, downtime schedules will be gone and that the server will start to check all of the hosts and services at this point.
Let us know if this is what you are looking for.
Be sure to check out our Knowledgebase for helpful articles and solutions!
redesgtt
Posts: 36
Joined: Mon May 16, 2016 2:57 am

Re: Problem "Services/host attended" in Nagios Core

Post by redesgtt »

tgriep wrote:I just want to verify that you have a problem, you fix it and Nagios reports it as OK but when you restart the server, it reports it as bad.
Is that true?
If so, it could be that the retention.dat file is corrupted. You could delete this file and the default location for that file is as follows.
/usr/local/nagios/var/retention.dat

What you would have to do is stop the nagios process, delete the file and then start up the nagios process.
One thing to note, when this file is deleted, all of the notes, downtime schedules will be gone and that the server will start to check all of the hosts and services at this point.
Let us know if this is what you are looking for.
I did not want to say that.

For example. Today if I attend a problem of a host/service, this is shown correctly as attended. Tomorrow, at 06:00, after restarting Nagios host/service will not show as attended. Their fields will be reseted to zero, such as the "Duration" field.

This happens With all hosts and services attended the previous day.
User avatar
tgriep
Madmin
Posts: 9190
Joined: Thu Oct 30, 2014 9:02 am

Re: Problem "Services/host attended" in Nagios Core

Post by tgriep »

Could you post your nagios.cfg file and one of the host / service config files for one one of the hosts that you are having issues with?
Can you provide a screen capture or more information on the before and after something was attended to?
Be sure to check out our Knowledgebase for helpful articles and solutions!
redesgtt
Posts: 36
Joined: Mon May 16, 2016 2:57 am

Re: Problem "Services/host attended" in Nagios Core

Post by redesgtt »

This is "nagios.cfg" file:


bash-3.2$ cat nagios.cfg
##############################################################################
#
# NAGIOS.CFG - Sample Main Config File for Nagios 3.2.3
#
# Read the documentation for more information on this configuration
# file. I've provided some comments here, but things may not be so
# clear without further explanation.
#
##############################################################################
log_file=/usr/local/nagios/ramdisk/nagios.log
log_rotation_method=d
log_archive_path=/usr/local/nagios/var/archives
cfg_file=/usr/local/nagios/etc/localhost.cfg
cfg_file=/usr/local/nagios/etc/check_commands.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/notify_commands.cfg
cfg_file=/usr/local/nagios/etc/host_templates.cfg
cfg_file=/usr/local/nagios/etc/hosts.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/service_templates.cfg
cfg_file=/usr/local/nagios/etc/servicegroups.cfg
cfg_file=/usr/local/nagios/etc/services.cfg
object_cache_file=/usr/local/nagios/ramdisk/objects.cache
precached_object_file=/usr/local/nagios/ramdisk/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
status_file=/usr/local/nagios/ramdisk/status.dat
status_update_interval=10
lock_file=/usr/local/nagios/var/nagios.lock
temp_file=/usr/local/nagios/ramdisk/nagios.tmp
temp_path=/tmp
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_file=/usr/local/nagios/var/rw/nagios.cmd
use_syslog=0
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_external_commands=1
log_passive_checks=1
service_inter_check_delay_method=s
max_service_check_spread=30
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=30
max_concurrent_checks=0
check_result_path=/usr/local/nagios/ramdisk/checkresults
max_check_result_file_age=3600
cached_host_check_horizon=15
cached_service_check_horizon=15
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
soft_state_dependencies=0
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
service_check_timeout=60
host_check_timeout=60
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=60
retain_state_information=1
state_retention_file=/usr/local/nagios/ramdisk/retention.dat
retention_update_interval=30
use_retained_program_state=1
use_retained_scheduling_info=1
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
interval_length=60
check_for_updates=0
bare_update_check=0
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_notifications=1
enable_event_handlers=1
process_performance_data=1
service_perfdata_file=/usr/local/nagios/ramdisk/perfdata.log
host_perfdata_file_template=$DATE$;$TIME$;$HOSTNAME$;$SERVICEDESC$;$SERVICEPERFDATA$
service_perfdata_file_template=$TIMET$;$HOSTNAME$;$SERVICEDESC$;$SERVICEPERFDATA$
host_perfdata_file_mode=a
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=60
service_perfdata_file_processing_command=perfdata_file_rrdtool
obsess_over_services=0
obsess_over_hosts=0
translate_passive_host_checks=0
passive_host_checks_are_soft=0
check_for_orphaned_services=1
check_for_orphaned_hosts=1
check_service_freshness=1
service_freshness_check_interval=60
check_host_freshness=1
host_freshness_check_interval=60
additional_freshness_latency=15
enable_flap_detection=0
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=euro

######################################################################
######################################################################


This is a services with "acknowledge problem" yesterday at 5:35pm
Image

and this is the same services today at 8:15am after restart Nagios service daily at 6:00am
Image
Attachments
service_no_acknowledgejpg.jpg
service_acknowledgejpg.jpg
User avatar
Box293
Too Basu
Posts: 5126
Joined: Sun Feb 07, 2010 10:55 pm
Location: Deniliquin, Australia
Contact:

Re: Problem "Services/host attended" in Nagios Core

Post by Box293 »

state_retention_file=/usr/local/nagios/ramdisk/retention.dat

I would move this file off the ramdisk. If the ramdisk is being re-created as part of the daily restart, this would result in losing all the information.

https://assets.nagios.com/downloads/nag ... ntion_file
This is the file that Nagios will use for storing status, downtime, and comment information before it shuts down. When Nagios is restarted it will use the information stored in this file for setting the initial states of services and hosts before it starts monitoring anything. In order to make Nagios retain state information between program restarts, you must enable the retain_state_information option.
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
redesgtt
Posts: 36
Joined: Mon May 16, 2016 2:57 am

Re: Problem "Services/host attended" in Nagios Core

Post by redesgtt »

I apologize for no respond before.

Finally we delete retention.dat in /usr/local/nagios/ramdisk/ and after we restart Nagios was created again but with less weight.

Because this was the problem: ramdisk partition was full. So in a future we delete retention.dat again or other option could be to increase ramdisk partion.

you can close this Post

thanks a lot for your advice :)
rkennedy
Posts: 6579
Joined: Mon Oct 05, 2015 11:45 am

Re: Problem "Services/host attended" in Nagios Core

Post by rkennedy »

As a recommendation, I would increase the ram disk so that in the future it doesn't happen.

Going to close this thread out, feel free to create a new one if you have questions in the future!
Former Nagios Employee
Locked