Orphaned checks.

An open discussion forum for obtaining help with Nagios Core. Nagios Core users of all experience levels are welcome here. Subforum have been created for the discussion of Nagios Core and Nagios Plugin development.

NOTE: The SourceForge.net mailing lists have been deprecated in favor of this forum in order to expedite support and provide additional features not available on the old mailing list.

Orphaned checks.

Postby linuxnerd » Mon Jul 02, 2018 1:04 am

Hi,

I'm running Nagios 3.5.1 (3.5.1.dfsg-2.1ubuntu1.3) on Ubuntu 16.04.4 and after Nagios runs for a while, I start seeing a whole bunch of
Warning: The check of service 'XXX' on host 'XXX' looks like it was orphaned. I'm scheduling an immediate check of the service... in the log.
When that happens, I start missing out on alerts.

I've seen similar threads on the issue suggesting disk space, cleaning out spool dir etc.
I've done all of them, but still getting the problem.
I have disabled the embedded perl intepreter.

enable_embedded_perl=0
use_embedded_perl_implicitly=0

Regards,
LN.
linuxnerd
 
Posts: 13
Joined: Mon Jul 02, 2018 12:57 am

Re: Orphaned checks.

Postby scottwilkerson » Mon Jul 02, 2018 1:34 pm

What is the output of the following

Code: Select all
ps -ef|grep nagios.cfg


Also, is this server using any nagios addons such as livestatus or mod_gearman?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 11563
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Orphaned checks.

Postby linuxnerd » Tue Jul 03, 2018 4:43 am

Code: Select all
nagios    1116     1  0 06:00 ?        00:00:38 /usr/sbin/nagios3 -x -d /etc/nagios/conf/nagios/nagios.cfg
nagios   11263  1116  0 09:22 ?        00:00:00 /usr/sbin/nagios3 -x -d /etc/nagios/conf/nagios/nagios.cfg


no addons.
linuxnerd
 
Posts: 13
Joined: Mon Jul 02, 2018 12:57 am

Re: Orphaned checks.

Postby scottwilkerson » Tue Jul 03, 2018 9:30 am

I would recommend completely stopping the service and starting again.

Code: Select all
service nagios stop
service nagios start


How many host/services are on this installation? How much memory and CPU's do you have?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 11563
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Orphaned checks.

Postby linuxnerd » Tue Jul 03, 2018 11:40 am

tried restarting many times but doesnt fix anything.
even tried deleting all the object cache files.

i have about 1500 services, and 250 hosts
4x cpu, 8g ram.

i've monitored my ram usage and it rarely goes above 3g.
cpu usage also looks normal.

load average: 0.14, 0.38, 0.23
linuxnerd
 
Posts: 13
Joined: Mon Jul 02, 2018 12:57 am

Re: Orphaned checks.

Postby scottwilkerson » Tue Jul 03, 2018 12:01 pm

Can you share your nagios.cfg?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 11563
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Orphaned checks.

Postby linuxnerd » Wed Jul 04, 2018 8:25 am

my cfg:

Code: Select all
log_file=/var/log/nagios3/nagios.log
cfg_dir=/data/XXXX/nagios/conf/nagios/objects/
object_cache_file=/var/cache/nagios3/objects.cache
precached_object_file=/var/lib/nagios3/objects.precache
resource_file=/etc/nagios3/resource.cfg
resource_file=/data/XXXX/nagios/conf/private/resource.cfg
status_file=/var/cache/nagios3/status.dat
status_update_interval=10
nagios_user=nagios
nagios_group=nagios
check_external_commands=1
command_check_interval=-1
command_file=/var/lib/nagios3/rw/nagios.cmd
external_command_buffer_slots=16384
lock_file=/var/run/nagios3/nagios3.pid
temp_file=/var/cache/nagios3/nagios.tmp
temp_path=/tmp
event_broker_options=-1
log_rotation_method=d
log_archive_path=/var/log/nagios3/archives
use_syslog=0
log_notifications=1
log_service_retries=1
log_host_retries=1
log_event_handlers=1
log_initial_states=0
log_external_commands=1
log_passive_checks=1
service_inter_check_delay_method=s
max_service_check_spread=30
service_interleave_factor=s
host_inter_check_delay_method=s
max_host_check_spread=30
max_concurrent_checks=0
check_result_reaper_frequency=60
max_check_result_reaper_time=180
check_result_path=/var/lib/nagios3/spool/checkresults
max_check_result_file_age=3600
cached_host_check_horizon=15
cached_service_check_horizon=15
enable_predictive_host_dependency_checks=1
allow_empty_hostgroup_assignment=1
enable_predictive_service_dependency_checks=1
soft_state_dependencies=0
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
sleep_time=0.25
service_check_timeout=90
host_check_timeout=30
event_handler_timeout=60
notification_timeout=60
ocsp_timeout=30
perfdata_timeout=30
retain_state_information=1
state_retention_file=/var/lib/nagios3/retention.dat
retention_update_interval=60
use_retained_program_state=1
use_retained_scheduling_info=1
retained_host_attribute_mask=0
retained_service_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
interval_length=60
check_for_updates=1
bare_update_check=0
use_aggressive_host_checking=0
execute_service_checks=1
accept_passive_service_checks=1
execute_host_checks=1
accept_passive_host_checks=1
enable_notifications=1
enable_event_handlers=1
process_performance_data=1
service_perfdata_file=/data/XXXX/nagios/data/pnp4nagios/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=30
service_perfdata_file_processing_command=XXXX_process_service_perfdata_file
service_perfdata_process_empty_results=1
obsess_over_services=1
ocsp_command=XXXX_ocsp_nsca
obsess_over_hosts=0
translate_passive_host_checks=0
passive_host_checks_are_soft=0
check_for_orphaned_services=1
check_for_orphaned_hosts=1
check_service_freshness=1
service_freshness_check_interval=60
service_check_timeout_state=c
check_host_freshness=0
host_freshness_check_interval=60
additional_freshness_latency=15
enable_flap_detection=0
low_service_flap_threshold=5.0
high_service_flap_threshold=20.0
low_host_flap_threshold=5.0
high_host_flap_threshold=20.0
date_format=iso8601
p1_file=/usr/lib/nagios3/p1.pl
enable_embedded_perl=0
use_embedded_perl_implicitly=0
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
illegal_macro_output_chars=`~$&|'"<>
use_regexp_matching=0
use_true_regexp_matching=0
admin_email=XXXX
admin_pager=XXXX
daemon_dumps_core=0
use_large_installation_tweaks=1
enable_environment_macros=1
debug_level=0
debug_verbosity=1
debug_file=/var/log/nagios3/nagios.debug
max_debug_file_size=1000000
Last edited by tmcdonald on Thu Jul 05, 2018 9:40 am, edited 1 time in total.
Reason: Please use [code][/code] tags around long output
linuxnerd
 
Posts: 13
Joined: Mon Jul 02, 2018 12:57 am

Re: Orphaned checks.

Postby scottwilkerson » Thu Jul 05, 2018 9:39 am

Do these happen for specific types of checks, such as check_mk checks, or some thing similar, or across the board?
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 11563
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Re: Orphaned checks.

Postby linuxnerd » Thu Jul 05, 2018 9:13 pm

scottwilkerson wrote:Do these happen for specific types of checks, such as check_mk checks, or some thing similar, or across the board?


across the board.
linuxnerd
 
Posts: 13
Joined: Mon Jul 02, 2018 12:57 am

Re: Orphaned checks.

Postby scottwilkerson » Fri Jul 06, 2018 1:27 pm

Can I have you change this
Code: Select all
auto_reschedule_checks=0

to this
Code: Select all
auto_reschedule_checks=1


and restart nagios.

If you still have an issue, please send the output of this along with the number of hosts/services on this system
Code: Select all
cat /etc/security/limits.conf|grep -v ^#
As of May 25th, 2018, all communications with Nagios Enterprises and its employees are covered under our new Privacy Policy.
User avatar
scottwilkerson
DevOps Engineer
 
Posts: 11563
Joined: Tue Nov 15, 2011 3:11 pm
Location: Nagios Enterprises

Next

Return to Nagios Core

Who is online

Users browsing this forum: jforcier and 19 guests