next check

This support forum board is for support questions relating to Nagios XI, our flagship commercial network monitoring solution.
Claro
Posts: 8
Joined: Wed Jul 11, 2012 1:34 pm

Re: next check

Post by Claro »

Regards,

Engineers proceeded to update the latest version of Nagios (Nagios XI 2011R3.2 Copyright © 2008-2012 Nagios Enterprises, LLC.) And restarted the server 4 times, the platform still remains fully degraded.

We have the license to XI, as we could use this support? What are the limitations of this?

You could connect to our platform and validate our setup? Or what kind of support we can arrange with you to help us the most quickly as possible.

As I am always very attentive to your prompt response.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: next check

Post by mguthrie »

From reviewing the earlier screenshots you sent from the XI dashboard, it seems like your issue could be disk I/O. VM's don't perform as well on Disk I/O as a physical server will. Your CPU load seems healthy for the amount of CPU cores that you have. I think that the disk writes are not able to keep up with the check schedule. Can you post your /usr/local/nagios/etc/nagios.cfg file and then I would also review our documentation on using a RAM disk to try and free up some disk activity.
http://assets.nagios.com/downloads/nagi ... giosXI.pdf
Claro
Posts: 8
Joined: Wed Jul 11, 2012 1:34 pm

Re: next check

Post by Claro »

Regards,

Engineers sent nagios.cfg configuration file. for the scheme (RAM Disk) we tried to configure but this presented problems because the configurábamos correctly but when rebooting the server did not recognize this new configuration, so we mitigate this level of virtualization where the vcenter to configure the data storege disk given unlimited resources.

And then the vendor who sold us the solution (http://www.fware.pro) argument that you could not stand this kind of solution.

As always we were very attentive.

Code: Select all

_____________________________________
# MODIFIED
admin_email=root@localhost
admin_pager=root@localhost
translate_passive_host_checks=1
log_event_handlers=0
use_large_installation_tweaks=1
enable_environment_macros=0


# NDOUtils module
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/ndomod.cfg


# PNP settings - bulk mode with NCPD
process_performance_data=1
# service performance data
service_perfdata_file=/usr/local/nagios/var/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\tSERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file-bulk
# host performance data
host_perfdata_file=/usr/local/nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHECKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tHOSTOUTPUT::$HOSTOUTPUT$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file-bulk


# OBJECTS - UNMODIFIED
#cfg_file=/usr/local/nagios/etc/objects/commands.cfg
#cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
#cfg_file=/usr/local/nagios/etc/objects/templates.cfg
#cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg


# STATIC OBJECT DEFINITIONS (THESE DON'T GET EXPORTED/IMPORTED BY NAGIOSQL)
cfg_dir=/usr/local/nagios/etc/static

# OBJECTS EXPORTED FROM NAGIOSQL
cfg_file=/usr/local/nagios/etc/contacttemplates.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/commands.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/servicegroups.cfg
cfg_file=/usr/local/nagios/etc/hosttemplates.cfg
cfg_file=/usr/local/nagios/etc/servicetemplates.cfg
cfg_file=/usr/local/nagios/etc/servicedependencies.cfg
cfg_file=/usr/local/nagios/etc/serviceescalations.cfg
cfg_file=/usr/local/nagios/etc/hostdependencies.cfg
cfg_file=/usr/local/nagios/etc/hostescalations.cfg
cfg_file=/usr/local/nagios/etc/hostextinfo.cfg
cfg_file=/usr/local/nagios/etc/serviceextinfo.cfg
cfg_dir=/usr/local/nagios/etc/hosts
cfg_dir=/usr/local/nagios/etc/services

# GLOBAL EVENT HANDLERS
global_host_event_handler=xi_host_event_handler
global_service_event_handler=xi_service_event_handler



# UNMODIFIED
accept_passive_host_checks=1
accept_passive_service_checks=1
additional_freshness_latency=15
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
bare_update_check=0
cached_host_check_horizon=15
cached_service_check_horizon=15
check_external_commands=1
check_for_orphaned_hosts=1
check_for_orphaned_services=1
check_for_updates=1
check_host_freshness=0
check_result_path=/usr/local/nagios/var/spool/checkresults
#check_result_reaper_frequency=10
check_result_reaper_frequency=3
check_service_freshness=1
command_check_interval=-1
command_file=/usr/local/nagios/var/rw/nagios.cmd
daemon_dumps_core=0
date_format=us
debug_file=/usr/local/nagios/var/nagios.debug
debug_level=0
debug_verbosity=0
enable_embedded_perl=1
enable_event_handlers=1
enable_flap_detection=1
enable_notifications=1
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
event_broker_options=-1
event_handler_timeout=30
execute_host_checks=1
execute_service_checks=1
external_command_buffer_slots=4096
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
host_check_timeout=30
host_freshness_check_interval=60
host_inter_check_delay_method=s
illegal_macro_output_chars=`~$&|'"<>
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
interval_length=60
lock_file=/usr/local/nagios/var/nagios.lock
log_archive_path=/usr/local/nagios/var/archives
log_external_commands=0
log_file=/usr/local/nagios/var/nagios.log
log_host_retries=1
log_initial_states=0
log_notifications=1
log_passive_checks=0
log_rotation_method=d
log_service_retries=1
low_host_flap_threshold=5.0
low_service_flap_threshold=5.0
max_check_result_file_age=3600
max_check_result_reaper_time=30
#max_check_result_reaper_time=10
max_concurrent_checks=0
max_debug_file_size=1000000
max_host_check_spread=30
max_service_check_spread=30
nagios_group=nagios
nagios_user=nagios
notification_timeout=30
object_cache_file=/usr/local/nagios/var/objects.cache
obsess_over_hosts=0
obsess_over_services=0
ocsp_timeout=5
p1_file=/usr/local/nagios/bin/p1.pl
passive_host_checks_are_soft=0
perfdata_timeout=5
precached_object_file=/usr/local/nagios/var/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
retained_host_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_service_attribute_mask=0
retain_state_information=1
retention_update_interval=60
service_check_timeout=60
service_freshness_check_interval=60
service_inter_check_delay_method=s
service_interleave_factor=s
sleep_time=0.25
soft_state_dependencies=0
state_retention_file=/usr/local/nagios/var/retention.dat
status_file=/usr/local/nagios/var/status.dat
status_update_interval=10
temp_file=/usr/local/nagios/var/nagios.tmp
temp_path=/tmp
use_aggressive_host_checking=0
use_embedded_perl_implicitly=1
use_regexp_matching=0
use_retained_program_state=1
use_retained_scheduling_info=1
use_syslog=0
use_true_regexp_matching=0

# DNX plugin
#broker_module=/usr/local/nagios/lib/dnxPlugin.so /usr/local/nagios/etc/dnxServer.cfg
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: next check

Post by mguthrie »

Can you show from running top, filtered by CPU time what the top 5 processes are?

We do support the RAM disk solution. Even with unlimited disk usage allocated to the VM, the limitation is on the physical disk, and this VM is still having to share with others. We have made updates and improvements to the RAM disk documentation since originally writing it. I would consider it for your scenario.
Claro
Posts: 8
Joined: Wed Jul 11, 2012 1:34 pm

Re: next check

Post by Claro »

Regards,

Engineers to email address I can send the requested images? Also to send screen shots showing that the virtual machine to Nagios is in a single resource pool, this ensures the resources of memory, CPU and storage, network cards.


I am very aware of your confirmation email.
mguthrie
Posts: 4380
Joined: Mon Jun 14, 2010 10:21 am

Re: next check

Post by mguthrie »

Got the screenshots. Change of theory. SNMP polling has a certain amount of wait time on the device's end of things, so it's possible that the lag time of the checks is slowly blocking the main nagios loop while it waits for the checks to return results. I do still think that Disk I\O issues are just as likely, but you could start with a simple experiment by increasing the check_interval for a large batch of your SNMP checks from 5mn to 7-10mn and see if that makes a difference in the latency.

Are you using any custom event handlers, notification commands, or OSCP commands?
Locked