Issue with services re-enabling after restart

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
GaWd
Posts: 51
Joined: Wed Dec 15, 2010 1:45 pm

Issue with services re-enabling after restart

Post by GaWd »

Hello!

I have a few servers that I have disabled (including all services on them). After I restart Nagios, there is 1 process on each server that have checks and notifications enabled, even though they have been disabled hundreds of times. Each service on these boxes is sitting with active checks and notifications disabled. So does every other service-including the check_host_alive check. If I restart the service, one process on each of these servers will go back to notification and active checking.

I've checked everything, and can only come up with some odd database issue?
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: Issue with services re-enabling after restart

Post by agriffin »

Make sure you have enabled use_retained_program_state in your nagios.cfg and that Nagios has read/write access to the state_retention_file.
GaWd
Posts: 51
Joined: Wed Dec 15, 2010 1:45 pm

Re: Issue with services re-enabling after restart

Post by GaWd »

Thanks for your reply. I enabled state retention, and verified that the retention file contents have changed as of the last restart. It still doesn't work. I looked at one of the entries for the server, as well as the identical check for a similar process on the same server, but I can't see anything that looks like a problem.

Non-working check:
service {
host_name=epos8-phx
service_description=SPWWWxmlimport
modified_attributes=3
check_command=check_spwwwxmlimport
check_period=24x7
notification_period=24x7
event_handler=
has_been_checked=0
check_execution_time=0.000
check_latency=0.000
check_type=0
current_state=0
last_state=0
last_hard_state=0
last_event_id=0
current_event_id=0
current_problem_id=0
last_problem_id=0
current_attempt=1
max_attempts=5
normal_check_interval=10.000000
retry_check_interval=2.000000
state_type=1
last_state_change=0
last_hard_state_change=0
last_time_ok=0
last_time_warning=0
last_time_unknown=0
last_time_critical=0
plugin_output=
long_plugin_output=
performance_data=
last_check=0
next_check=1343421350
check_options=0
notified_on_unknown=0
notified_on_warning=0
notified_on_critical=0
current_notification_number=0
current_notification_id=0
last_notification=0
notifications_enabled=0
active_checks_enabled=0
passive_checks_enabled=1
event_handler_enabled=1
problem_has_been_acknowledged=0
acknowledgement_type=0
flap_detection_enabled=1
failure_prediction_enabled=1
process_performance_data=0
obsess_over_service=1
is_flapping=0
percent_state_change=0.00
check_flapping_recovery_notification=0
state_history=0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
}

Working Check:
service {
host_name=epos8-phx
service_description=SPWWWxmlexport
modified_attributes=3
check_command=check_spwwwxmlexport
check_period=24x7
notification_period=24x7
event_handler=
has_been_checked=1
check_execution_time=0.035
check_latency=0.226
check_type=0
current_state=1
last_state=1
last_hard_state=1
last_event_id=130703
current_event_id=131491
current_problem_id=53573
last_problem_id=53266
current_attempt=5
max_attempts=5
normal_check_interval=10.000000
retry_check_interval=2.000000
state_type=1
last_state_change=1342494302
last_hard_state_change=1342494782
last_time_ok=1342493703
last_time_warning=1343320281
last_time_unknown=0
last_time_critical=0
plugin_output=PROCS WARNING: 0 processes with command name 'spwwwxmlexport'
long_plugin_output=
performance_data=
last_check=1343320281
next_check=0
check_options=0
notified_on_unknown=0
notified_on_warning=0
notified_on_critical=0
current_notification_number=0
current_notification_id=0
last_notification=0
notifications_enabled=0
active_checks_enabled=0
passive_checks_enabled=1
event_handler_enabled=1
problem_has_been_acknowledged=0
acknowledgement_type=0
flap_detection_enabled=1
failure_prediction_enabled=1
process_performance_data=1
obsess_over_service=1
is_flapping=0
percent_state_change=0.00
check_flapping_recovery_notification=0
state_history=1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
}
Here are the permissions for the /var dir for nagios:

[root@XXXnagios ~]# ls -lh /usr/local/nagios/var/
total 52M
drwxrwxr-x 2 nagios nagios 22M Mar 23 2010 archives
-rw-r--r-- 1 nagios nagios 0 Jul 27 13:42 nagios.lock
-rw-rw-r-- 1 nagios nagcmd 19M Jul 27 13:53 nagios.log
-rw-rw-r-- 1 nagios nagcmd 2.1M Jul 22 04:25 nagios.log.1.gz
-rw-rw-r-- 1 nagios nagcmd 2.1M Jul 15 04:24 nagios.log.2.gz
-rw-rw-r-- 1 nagios nagcmd 2.0M Jul 8 04:24 nagios.log.3.gz
-rw-rw-r-- 1 nagios nagcmd 2.0M Jul 1 04:23 nagios.log.4.gz
-rw-rw-r-- 1 nagios nagcmd 2.0M Jun 24 04:24 nagios.log.5.gz
-rw-r--r-- 1 nagios nagcmd 6 Jul 27 13:42 nagios.pid
-rw-r--r-- 1 nagios nagios 507K Jul 27 13:42 objects.cache
-rw------- 1 nagios nagcmd 760K Jul 27 13:42 retention.dat
drwxrwsr-x 2 nagios nagios 4.0K Jul 27 13:42 rw
drwxrwxr-x 3 nagios nagios 4.0K Jun 3 2009 spool
-rw-rw-r-- 1 nagios nagcmd 757K Jul 27 13:53 status.dat

Do you see something that I'm missing?
GaWd
Posts: 51
Joined: Wed Dec 15, 2010 1:45 pm

Re: Issue with services re-enabling after restart

Post by GaWd »

Also, understand that I have over 500 services that I monitor, and there are only 2 services-1 on each of 2 servers-that do this. So the system seems to work excellently, with the exception of 2 services.
agriffin
Posts: 876
Joined: Mon May 09, 2011 9:36 am

Re: Issue with services re-enabling after restart

Post by agriffin »

I'm not really sure what could be going on based on what you've posted so far. Could you post the service definitions of the non-working services and one working service?
Locked