Page 2 of 3

Re: Monitoring Engine Process failing to start

Posted: Wed Jul 30, 2014 11:40 am
by BanditBBS
I just applied changes and it is doing it again. I happened to be looking at top and ndo2db is taking 100% cpu during the long restart. 10+ minutes later, ndo2db drops to 1% and all 6 are green. ndo2db keeps cycling to 100% for a while and then drops. So this seems to be the issue.

Re: Monitoring Engine Process failing to start

Posted: Thu Jul 31, 2014 11:12 am
by lmiltchev
I am pretty sure you checked for database errors, but I need to ask anyway... just in case. :)
ndo2db keeps cycling to 100% for a while and then drops. So this seems to be the issue.
It take 10+ minutes before ndo2db's cpu usage drops down, right? How long does it take to go up again? Any pattern at all or the cpu usage is going up the roof only on reboot?

Re: Monitoring Engine Process failing to start

Posted: Thu Jul 31, 2014 11:20 am
by BanditBBS
Databases seem fine.

Yes, when it spikes it lasts for 10-15 minutes. If it is not spike and apply config or a system reboot is performed the spike starts immediately. If We don't do anything it normally spikes within a few minutes of spike ending. So technically is is spiking all day.

Re: Monitoring Engine Process failing to start

Posted: Thu Jul 31, 2014 8:06 pm
by BanditBBS
More information....

It seems this is a during the day only issue. Yesterday and today, during work hours ndo2db is 100% cpu but last night and now this evening and I assume all night, everything is fine. I just applied changes abotu 5 time sin a row(testing my escaping issue - another thread) and instantly everything is green. Any ideas why ndo2db would be 100% CPU only during the day?

Re: Monitoring Engine Process failing to start

Posted: Fri Aug 01, 2014 9:20 am
by tmcdonald
BanditBBS wrote:Any ideas why ndo2db would be 100% CPU only during the day?
Is it a reverse vampire?

More to the point, do you have any custom timezones timeperiods in use that fit that range?

EDIT: Fixed typo - Dealing with timezone issues all week will do that to you

Re: Monitoring Engine Process failing to start

Posted: Fri Aug 01, 2014 9:24 am
by BanditBBS
tmcdonald wrote:
BanditBBS wrote:Any ideas why ndo2db would be 100% CPU only during the day?
Is it a reverse vampire?

More to the point, do you have any custom timezones in use that fit that range?
Timezones? You mean timeperiods in nagios or you actually mean timezones for the system/php? The answer to both would be Nope! I haven't created any time periods and timezones are CST

Re: Monitoring Engine Process failing to start

Posted: Fri Aug 01, 2014 1:47 pm
by sreinhardt
Between the log file with no issues, the cpu spikes still, and the other ndo style issues we have been seeing recently, I would tend to agree this is still an artifact of the load issues. Not that, that helps much, but at least you are not alone and it's an issue we have at the front of the line. Could you post or send over your nagios.cfg and ndo2db.cfg please? Want to have a look and make sure there aren't any tweaks that might influence this, but I'm doubting it knowing you.

Re: Monitoring Engine Process failing to start

Posted: Fri Aug 01, 2014 1:54 pm
by BanditBBS
nagios.cfg

Code: Select all

# MODIFIED
admin_email=root@localhost
admin_pager=root@localhost
translate_passive_host_checks=1
log_event_handlers=0
use_large_installation_tweaks=1
enable_environment_macros=0


# NDOUtils module
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg


# PNP settings - bulk mode with NCPD
process_performance_data=1
# service performance data
service_perfdata_file=/var/nagiosramdisk/service-perfdata

service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file-bulk
# host performance data
host_perfdata_file=/var/nagiosramdisk/host-perfdata

host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tHOSTOUTPUT::$HOSTOUTPUT$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file-bulk


# OBJECTS - UNMODIFIED
#cfg_file=/usr/local/nagios/etc/objects/commands.cfg
#cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
#cfg_file=/usr/local/nagios/etc/objects/templates.cfg
#cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg


# STATIC OBJECT DEFINITIONS (THESE DON'T GET EXPORTED/IMPORTED BY NAGIOSQL)
cfg_dir=/usr/local/nagios/etc/static

# OBJECTS EXPORTED FROM NAGIOSQL
cfg_file=/usr/local/nagios/etc/contacttemplates.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/commands.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/servicegroups.cfg
cfg_file=/usr/local/nagios/etc/hosttemplates.cfg
cfg_file=/usr/local/nagios/etc/servicetemplates.cfg
cfg_file=/usr/local/nagios/etc/servicedependencies.cfg
cfg_file=/usr/local/nagios/etc/serviceescalations.cfg
cfg_file=/usr/local/nagios/etc/hostdependencies.cfg
cfg_file=/usr/local/nagios/etc/hostescalations.cfg
cfg_file=/usr/local/nagios/etc/hostextinfo.cfg
cfg_file=/usr/local/nagios/etc/serviceextinfo.cfg
cfg_dir=/usr/local/nagios/etc/hosts
cfg_dir=/usr/local/nagios/etc/services

# GLOBAL EVENT HANDLERS
global_host_event_handler=xi_host_event_handler
global_service_event_handler=xi_service_event_handler



# UNMODIFIED
accept_passive_host_checks=1
accept_passive_service_checks=1
additional_freshness_latency=15
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
bare_update_check=0
cached_host_check_horizon=15
cached_service_check_horizon=15
check_external_commands=1
check_for_orphaned_hosts=1
check_for_orphaned_services=1
check_for_updates=1
check_host_freshness=0
check_result_path=/var/nagiosramdisk/spool/checkresults
check_result_reaper_frequency=10
check_service_freshness=1
check_workers=6
#command_check_interval=-1
command_file=/usr/local/nagios/var/rw/nagios.cmd
daemon_dumps_core=0
date_format=us
debug_file=/usr/local/nagios/var/nagios.debug
debug_level=0
debug_verbosity=1
#enable_embedded_perl=1
enable_event_handlers=1
enable_flap_detection=1
enable_notifications=1
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
event_broker_options=-1
event_handler_timeout=30
execute_host_checks=1
execute_service_checks=1
#external_command_buffer_slots=4096
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
host_check_timeout=30
host_freshness_check_interval=60
host_inter_check_delay_method=s
illegal_macro_output_chars=`~$&|'"<>
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
interval_length=60
lock_file=/usr/local/nagios/var/nagios.lock
log_archive_path=/usr/local/nagios/var/archives
log_external_commands=0
log_file=/usr/local/nagios/var/nagios.log
log_host_retries=1
log_initial_states=0
log_notifications=1
log_passive_checks=0
log_rotation_method=d
log_service_retries=1
low_host_flap_threshold=5.0
low_service_flap_threshold=5.0
max_check_result_file_age=3600
max_check_result_reaper_time=30
max_concurrent_checks=0
max_debug_file_size=1000000
#max_host_check_spread=30
max_host_check_spread=60
#max_service_check_spread=30
max_service_check_spread=60
nagios_group=nagios
nagios_user=nagios
notification_timeout=30
object_cache_file=/var/nagiosramdisk/objects.cache
status_file=/var/nagiosramdisk/status.dat
temp_path=/var/nagiosramdisk/tmp
obsess_over_hosts=0
obsess_over_services=0
ocsp_timeout=5
#p1_file=/usr/local/nagios/bin/p1.pl
passive_host_checks_are_soft=0
perfdata_timeout=5
precached_object_file=/usr/local/nagios/var/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
retained_host_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_service_attribute_mask=0
retain_state_information=1
retention_update_interval=60
service_check_timeout=60
service_freshness_check_interval=60
service_inter_check_delay_method=s
service_interleave_factor=s
#sleep_time=0.25
soft_state_dependencies=0
state_retention_file=/usr/local/nagios/var/retention.dat
status_update_interval=10
temp_file=/usr/local/nagios/var/nagios.tmp
use_aggressive_host_checking=0
###use_embedded_perl_implicitly=1
use_regexp_matching=0
use_retained_program_state=1
use_retained_scheduling_info=1
use_syslog=1
use_true_regexp_matching=0
ndo2db.cfg

Code: Select all

#####################################################################
# NDO2DB DAEMON CONFIG FILE
#####################################################################


lock_file=/usr/local/nagios/var/ndo2db.lock

ndo2db_user=nagios
ndo2db_group=nagios

socket_type=unix

socket_name=/usr/local/nagios/var/ndo.sock

tcp_port=5668


db_servertype=mysql
db_host=localhost
db_port=3306

db_name=nagios
db_prefix=nagios_

db_user=ndoutils
db_pass=n@gweb



## TABLE TRIMMING OPTIONS
# Several database tables containing Nagios event data can become quite large
# over time.  Most admins will want to trim these tables and keep only a
# certain amount of data in them.  The options below are used to specify the
# age (in MINUTES) that data should be allowd to remain in various tables
# before it is deleted.  Using a value of zero (0) for any value means that
# that particular table should NOT be automatically trimmed.

# Keep timed events for 24 hours
max_timedevents_age=1440

# Keep system commands for 1 week
max_systemcommands_age=10080

# Keep service checks for 1 week
max_servicechecks_age=10080

# Keep host checks for 1 week
max_hostchecks_age=10080

# Keep event handlers for 31 days
max_eventhandlers_age=44640





# DEBUG LEVEL
# This option determines how much (if any) debugging information will
# be written to the debug file.  OR values together to log multiple
# types of information.
# Values: -1 = Everything
#          0 = Nothing
#          1 = Process info
#	   2 = SQL queries

debug_level=0



# DEBUG VERBOSITY
# This option determines how verbose the debug log out will be.
# Values: 0 = Brief output
#         1 = More detailed
#         2 = Very detailed

debug_verbosity=1



# DEBUG FILE
# This option determines where the daemon should write debugging information.

debug_file=/usr/local/nagios/var/ndo2db.debug



# MAX DEBUG FILE SIZE
# This option determines the maximum size (in bytes) of the debug file.  If
# the file grows larger than this size, it will be renamed with a .old
# extension.  If a file already exists with a .old extension it will
# automatically be deleted.  This helps ensure your disk space usage doesn't
# get out of control when debugging.

max_debug_file_size=1000000

Re: Monitoring Engine Process failing to start

Posted: Fri Aug 01, 2014 2:35 pm
by sreinhardt
Dang, just leave it mostly stock then! I'm going to have to think about this one. You just took away my last idea at the moment.

Re: Monitoring Engine Process failing to start

Posted: Tue Aug 19, 2014 12:14 pm
by BanditBBS
Over the past 24 hours, every time we have applied configs it has taken 1 minute to everything to be green. I can live with that, so hopefully it doesn't revert back to the 15 minute time frame.

Also, the scheduled checks is so flat its hard to tell if the dashlet is actually updating sometimes...so that seems much better.