Page 1 of 2

Orphaned checks.

Posted: Mon Jul 02, 2018 1:04 am
by linuxnerd
Hi,

I'm running Nagios 3.5.1 (3.5.1.dfsg-2.1ubuntu1.3) on Ubuntu 16.04.4 and after Nagios runs for a while, I start seeing a whole bunch of
Warning: The check of service 'XXX' on host 'XXX' looks like it was orphaned. I'm scheduling an immediate check of the service... in the log.
When that happens, I start missing out on alerts.

I've seen similar threads on the issue suggesting disk space, cleaning out spool dir etc.
I've done all of them, but still getting the problem.
I have disabled the embedded perl intepreter.

enable_embedded_perl=0
use_embedded_perl_implicitly=0

Regards,
LN.

Re: Orphaned checks.

Posted: Mon Jul 02, 2018 1:34 pm
by scottwilkerson
What is the output of the following

Code: Select all

ps -ef|grep nagios.cfg
Also, is this server using any nagios addons such as livestatus or mod_gearman?

Re: Orphaned checks.

Posted: Tue Jul 03, 2018 4:43 am
by linuxnerd

Code: Select all

nagios    1116     1  0 06:00 ?        00:00:38 /usr/sbin/nagios3 -x -d /etc/nagios/conf/nagios/nagios.cfg
nagios   11263  1116  0 09:22 ?        00:00:00 /usr/sbin/nagios3 -x -d /etc/nagios/conf/nagios/nagios.cfg
no addons.

Re: Orphaned checks.

Posted: Tue Jul 03, 2018 9:30 am
by scottwilkerson
I would recommend completely stopping the service and starting again.

Code: Select all

service nagios stop
service nagios start
How many host/services are on this installation? How much memory and CPU's do you have?

Re: Orphaned checks.

Posted: Tue Jul 03, 2018 11:40 am
by linuxnerd
tried restarting many times but doesnt fix anything.
even tried deleting all the object cache files.

i have about 1500 services, and 250 hosts
4x cpu, 8g ram.

i've monitored my ram usage and it rarely goes above 3g.
cpu usage also looks normal.

load average: 0.14, 0.38, 0.23

Re: Orphaned checks.

Posted: Tue Jul 03, 2018 12:01 pm
by scottwilkerson
Can you share your nagios.cfg?

Re: Orphaned checks.

Posted: Wed Jul 04, 2018 8:25 am
by linuxnerd
my cfg:

Code: Select all

log_file=/var/log/nagios3/nagios.log
cfg_dir=/data/XXXX/nagios/conf/nagios/objects/
object_cache_file=/var/cache/nagios3/objects.cache
precached_object_file=/var/lib/nagios3/objects.precache
resource_file=/etc/nagios3/resource.cfg
resource_file=/data/XXXX/nagios/conf/private/resource.cfg
status_file=/var/cache/nagios3/status.dat
status_update_interval=10
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_check_interval=-1
command_file=/var/lib/nagios3/rw/nagios.cmd
external_command_buffer_slots=16384
lock_file=/var/run/nagios3/nagios3.pid
temp_file=/var/cache/nagios3/nagios.tmp
temp_path=/tmp
event_broker_options=-1
log_rotation_method=d
log_archive_path=/var/log/nagios3/archives
use_syslog=0
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_external_commands=1
log_passive_checks=1
service_inter_check_delay_method=s
max_service_check_spread=30
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=30
max_concurrent_checks=0
check_result_reaper_frequency=60
max_check_result_reaper_time=180
check_result_path=/var/lib/nagios3/spool/checkresults
max_check_result_file_age=3600
cached_host_check_horizon=15
cached_service_check_horizon=15
enable_predictive_host_dependency_checks=1
allow_empty_hostgroup_assignment=1
enable_predictive_service_dependency_checks=1
soft_state_dependencies=0
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
sleep_time=0.25
service_check_timeout=90
host_check_timeout=30
event_handler_timeout=60
notification_timeout=60
ocsp_timeout=30
perfdata_timeout=30
retain_state_information=1
state_retention_file=/var/lib/nagios3/retention.dat
retention_update_interval=60
use_retained_program_state=1
use_retained_scheduling_info=1
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
interval_length=60
check_for_updates=1
bare_update_check=0
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_notifications=1
enable_event_handlers=1
process_performance_data=1
service_perfdata_file=/data/XXXX/nagios/data/pnp4nagios/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=30
service_perfdata_file_processing_command=XXXX_process_service_perfdata_file
service_perfdata_process_empty_results=1
obsess_over_services=1
ocsp_command=XXXX_ocsp_nsca
obsess_over_hosts=0
translate_passive_host_checks=0
passive_host_checks_are_soft=0
check_for_orphaned_services=1
check_for_orphaned_hosts=1
check_service_freshness=1
service_freshness_check_interval=60
service_check_timeout_state=c
check_host_freshness=0
host_freshness_check_interval=60
additional_freshness_latency=15
enable_flap_detection=0
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=iso8601
p1_file=/usr/lib/nagios3/p1.pl
enable_embedded_perl=0
use_embedded_perl_implicitly=0
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
illegal_macro_output_chars=`~$&|'"<>
use_regexp_matching=0
use_true_regexp_matching=0
admin_email=XXXX
admin_pager=XXXX
daemon_dumps_core=0
use_large_installation_tweaks=1
enable_environment_macros=1
debug_level=0
debug_verbosity=1
debug_file=/var/log/nagios3/nagios.debug
max_debug_file_size=1000000

Re: Orphaned checks.

Posted: Thu Jul 05, 2018 9:39 am
by scottwilkerson
Do these happen for specific types of checks, such as check_mk checks, or some thing similar, or across the board?

Re: Orphaned checks.

Posted: Thu Jul 05, 2018 9:13 pm
by linuxnerd
scottwilkerson wrote:Do these happen for specific types of checks, such as check_mk checks, or some thing similar, or across the board?
across the board.

Re: Orphaned checks.

Posted: Fri Jul 06, 2018 1:27 pm
by scottwilkerson
Can I have you change this

Code: Select all

auto_reschedule_checks=0
to this

Code: Select all

auto_reschedule_checks=1
and restart nagios.

If you still have an issue, please send the output of this along with the number of hosts/services on this system

Code: Select all

cat /etc/security/limits.conf|grep -v ^#