Page 1 of 1

External commands only get processed after several minutes

Posted: Wed Jul 30, 2014 2:01 am
by Peter.Hoogendijk
Operating over 50 nagios core 3 servers we discovered some strange behavior on several servers: external commands written into the command pipe only get processed after several minutes. On one server we saw a delay of over 20 minutes. We are running Nagios Core 3.2.3 on 64-bits Redhat 5 or 6. The debug log on one of those servers showed the following:

Code: Select all

[1406623333.323090] [001.0] [pid=24222] process_external_command1()
[1406623333.323095] [128.2] [pid=24222] Raw command entry: [1406622495] PROCESS_SERVICE_CHECK_RESULT;58CCE91A-32B8-408C-B1A8-023B96806A9E;f36fa53c-1c03-410f-98cd-319d0b16c566;0;OK - Reset to OK manually by Nagiosadmin
[1406623333.323147] [001.0] [pid=24222] process_external_command2()
[1406623333.323155] [128.1] [pid=24222] External Command Type: 30
[1406623333.323160] [128.1] [pid=24222] Command Entry Time: 1406622495
[1406623333.323165] [128.1] [pid=24222] Command Arguments: 58CCE91A-32B8-408C-B1A8-023B96806A9E;f36fa53c-1c03-410f-98cd-319d0b16c566;0;OK - Reset to OK manually by Nagiosadmin
When updating configurations (from a central database) we use a SIGHUP so the nagios process is never stopped. After stopping and restarting nagios external command processing happens without any delay. But sometimes even within a day the delay starts growing again. Some things we already determined:
  • No issue with the amount of command buffer slots (we had that issue in the past and solved it).
  • No issues with high iowait (we had issues on VM's with slow SAN storage but it also happens on real hardware with local very fast storage).
Please let us know if you have any ideas on investigating this issue. We have a workaround (restarting nagios every 24 hours) but than we are unable to investigate this issue any further.

Regards, Peter Hoogendijk.

Re: External commands only get processed after several minut

Posted: Wed Jul 30, 2014 4:41 pm
by lmiltchev
Can you post the nagios.cfg file?

Re: External commands only get processed after several minut

Posted: Tue Aug 05, 2014 2:59 am
by Peter.Hoogendijk
Sorry for the delay. Some urgent operational issues took a lot of time. This is the nagios.cfg file for all (over 50) nagios nodes:

Code: Select all


log_file=/var/Nagios/Node/NagiosNode/nagios.log
cfg_file=/var/Nagios/Node/NagiosNode/objects.cfg

object_cache_file=/var/Nagios/Node/NagiosNode/objects.cache
precached_object_file=/var/Nagios/Node/NagiosNode/objects.precache
status_file=/var/Nagios/Node/NagiosNode/status.dat
command_file=/var/Nagios/Node/NagiosNode/rw/nagios.cmd
lock_file=/var/Nagios/Node/NagiosNode/nagios.pid
temp_file=/var/Nagios/Node/NagiosNode/nagios.tmp
temp_path=/tmp/
log_archive_path=/var/Nagios/Node/NagiosNode/log/

status_update_interval=10
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_check_interval=-1
external_command_buffer_slots=16384
event_broker_options=0

log_rotation_method=d
use_syslog=0
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_external_commands=1
log_passive_checks=1

service_inter_check_delay_method=s
max_service_check_spread=30
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=30
max_concurrent_checks=0
check_result_reaper_frequency=10
max_check_result_reaper_time=30
check_result_path=/var/Nagios/Node/NagiosNode/checkresults/
max_check_result_file_age=3600
cached_host_check_horizon=15
cached_service_check_horizon=15
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
soft_state_dependencies=0

auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180

sleep_time=0.25

service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5

retain_state_information=1
state_retention_file=/var/Nagios/Node/NagiosNode/retention.dat
retention_update_interval=60
use_retained_program_state=1
use_retained_scheduling_info=1
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0

interval_length=60
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_notifications=1
enable_event_handlers=1

process_performance_data=0

obsess_over_services=0
obsess_over_hosts=0

translate_passive_host_checks=0
passive_host_checks_are_soft=0
check_for_orphaned_services=1
check_for_orphaned_hosts=1
check_service_freshness=1
service_freshness_check_interval=60
check_host_freshness=0
host_freshness_check_interval=60
additional_freshness_latency=15

enable_flap_detection=1
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0

date_format=iso8601

p1_file=/opt/Nagios/Node/p1.pl
enable_embedded_perl=0
use_embedded_perl_implicitly=0

illegal_object_name_chars=`~!$%^&*|'"<>?,()=
illegal_macro_output_chars=`~$&|'"<>
use_regexp_matching=0
use_true_regexp_matching=0
admin_email=root@localhost
admin_pager=pageroot@localhost
daemon_dumps_core=0
use_large_installation_tweaks=0
enable_environment_macros=1

debug_level=0
debug_verbosity=1
debug_file=/var/Nagios/Node/NagiosNode/nagios.debug
max_debug_file_size=1000000
On two or three larger nodes external_command_buffer_slots has been increased (but not on the system this topic is about). SElinux has been set to permissive. We did not yet try to fully disable it.

Re: External commands only get processed after several minut

Posted: Thu Aug 07, 2014 12:09 pm
by abrist
Very odd. command_check_interval is set to -1, so nagios should be checking external commands as often as possible. Lets try forcing it to a smallish number:

Code: Select all

 command_check_interval=30s
Restart nagios and watch the behavior.