External commands only get processed after several minutes

Support forum for Nagios Core, Nagios Plugins, NCPA, NRPE, NSCA, NDOUtils and more. Engage with the community of users including those using the open source solutions.
Locked
Peter.Hoogendijk
Posts: 2
Joined: Wed Jul 30, 2014 1:20 am

External commands only get processed after several minutes

Post by Peter.Hoogendijk »

Operating over 50 nagios core 3 servers we discovered some strange behavior on several servers: external commands written into the command pipe only get processed after several minutes. On one server we saw a delay of over 20 minutes. We are running Nagios Core 3.2.3 on 64-bits Redhat 5 or 6. The debug log on one of those servers showed the following:

Code: Select all

[1406623333.323090] [001.0] [pid=24222] process_external_command1()
[1406623333.323095] [128.2] [pid=24222] Raw command entry: [1406622495] PROCESS_SERVICE_CHECK_RESULT;58CCE91A-32B8-408C-B1A8-023B96806A9E;f36fa53c-1c03-410f-98cd-319d0b16c566;0;OK - Reset to OK manually by Nagiosadmin
[1406623333.323147] [001.0] [pid=24222] process_external_command2()
[1406623333.323155] [128.1] [pid=24222] External Command Type: 30
[1406623333.323160] [128.1] [pid=24222] Command Entry Time: 1406622495
[1406623333.323165] [128.1] [pid=24222] Command Arguments: 58CCE91A-32B8-408C-B1A8-023B96806A9E;f36fa53c-1c03-410f-98cd-319d0b16c566;0;OK - Reset to OK manually by Nagiosadmin
When updating configurations (from a central database) we use a SIGHUP so the nagios process is never stopped. After stopping and restarting nagios external command processing happens without any delay. But sometimes even within a day the delay starts growing again. Some things we already determined:
  • No issue with the amount of command buffer slots (we had that issue in the past and solved it).
  • No issues with high iowait (we had issues on VM's with slow SAN storage but it also happens on real hardware with local very fast storage).
Please let us know if you have any ideas on investigating this issue. We have a workaround (restarting nagios every 24 hours) but than we are unable to investigate this issue any further.

Regards, Peter Hoogendijk.
User avatar
lmiltchev
Former Nagios Staff
Posts: 13587
Joined: Mon May 23, 2011 12:15 pm

Re: External commands only get processed after several minut

Post by lmiltchev »

Can you post the nagios.cfg file?
Be sure to check out our Knowledgebase for helpful articles and solutions!
Peter.Hoogendijk
Posts: 2
Joined: Wed Jul 30, 2014 1:20 am

Re: External commands only get processed after several minut

Post by Peter.Hoogendijk »

Sorry for the delay. Some urgent operational issues took a lot of time. This is the nagios.cfg file for all (over 50) nagios nodes:

Code: Select all


log_file=/var/Nagios/Node/NagiosNode/nagios.log
cfg_file=/var/Nagios/Node/NagiosNode/objects.cfg

object_cache_file=/var/Nagios/Node/NagiosNode/objects.cache
precached_object_file=/var/Nagios/Node/NagiosNode/objects.precache
status_file=/var/Nagios/Node/NagiosNode/status.dat
command_file=/var/Nagios/Node/NagiosNode/rw/nagios.cmd
lock_file=/var/Nagios/Node/NagiosNode/nagios.pid
temp_file=/var/Nagios/Node/NagiosNode/nagios.tmp
temp_path=/tmp/
log_archive_path=/var/Nagios/Node/NagiosNode/log/

status_update_interval=10
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_check_interval=-1
external_command_buffer_slots=16384
event_broker_options=0

log_rotation_method=d
use_syslog=0
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_external_commands=1
log_passive_checks=1

service_inter_check_delay_method=s
max_service_check_spread=30
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=30
max_concurrent_checks=0
check_result_reaper_frequency=10
max_check_result_reaper_time=30
check_result_path=/var/Nagios/Node/NagiosNode/checkresults/
max_check_result_file_age=3600
cached_host_check_horizon=15
cached_service_check_horizon=15
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
soft_state_dependencies=0

auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180

sleep_time=0.25

service_check_timeout=60
host_check_timeout=30
event_handler_timeout=30
notification_timeout=30
ocsp_timeout=5
perfdata_timeout=5

retain_state_information=1
state_retention_file=/var/Nagios/Node/NagiosNode/retention.dat
retention_update_interval=60
use_retained_program_state=1
use_retained_scheduling_info=1
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0

interval_length=60
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_notifications=1
enable_event_handlers=1

process_performance_data=0

obsess_over_services=0
obsess_over_hosts=0

translate_passive_host_checks=0
passive_host_checks_are_soft=0
check_for_orphaned_services=1
check_for_orphaned_hosts=1
check_service_freshness=1
service_freshness_check_interval=60
check_host_freshness=0
host_freshness_check_interval=60
additional_freshness_latency=15

enable_flap_detection=1
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0

date_format=iso8601

p1_file=/opt/Nagios/Node/p1.pl
enable_embedded_perl=0
use_embedded_perl_implicitly=0

illegal_object_name_chars=`~!$%^&*|'"<>?,()=
illegal_macro_output_chars=`~$&|'"<>
use_regexp_matching=0
use_true_regexp_matching=0
admin_email=root@localhost
admin_pager=pageroot@localhost
daemon_dumps_core=0
use_large_installation_tweaks=0
enable_environment_macros=1

debug_level=0
debug_verbosity=1
debug_file=/var/Nagios/Node/NagiosNode/nagios.debug
max_debug_file_size=1000000
On two or three larger nodes external_command_buffer_slots has been increased (but not on the system this topic is about). SElinux has been set to permissive. We did not yet try to fully disable it.
abrist
Red Shirt
Posts: 8334
Joined: Thu Nov 15, 2012 1:20 pm

Re: External commands only get processed after several minut

Post by abrist »

Very odd. command_check_interval is set to -1, so nagios should be checking external commands as often as possible. Lets try forcing it to a smallish number:

Code: Select all

 command_check_interval=30s
Restart nagios and watch the behavior.
Former Nagios employee
"It is turtles. All. The. Way. Down. . . .and maybe an elephant or two."
VI VI VI - The editor of the Beast!
Come to the Dark Side.
Locked