Page 1 of 3

Blank Performance Reports

Posted: Mon Mar 04, 2013 9:20 am
by kotterbein
I have had blank performance reports for a while now, so I took the time to upgrade to XI 2012R1.6 this weekend in the hopes of resolving this issue. however it is still occurring across the board with performance data- how can I check my configuration to be sure XI is pointed to the correct performance data archives, as I THINK this may have to do with performance updates (using tmpfs to increase speed) and it coincides with the two nodes I have done extensive performance work on. (other test instances are graphing fine with no issues).

I searched the forum to try to find a quick answer but was unable to.

Thanks-

Re: Blank Performance Reports

Posted: Mon Mar 04, 2013 10:59 am
by abrist
Just to clarify - are all of your performance graphs blank, or are you just having problems with specific reports?

Could you tail the following logs and post the results in a code wrap?

Code: Select all

tail -50 /usr/local/nagios/var/perfdata.log
tail -50 /usr/local/nagios/var/npcd.log

Re: Blank Performance Reports

Posted: Mon Mar 04, 2013 11:00 am
by slansing
By default performance data is read out of /usr/local/nagios/var/service-perfdata or /host-perfdata, the following is an excerpt from a stock Nagios system with defaults running:

Can you please post your nagios.cfg file as well as the output of the following commands:

Lets see if the data is being picked up:

Code: Select all

ll /usr/local/nagios/var/spool/perfdata/| wc -l

Code: Select all

ll /usr/local/nagios/var/spool/xidpe|wc -l

Re: Blank Performance Reports

Posted: Mon Mar 04, 2013 11:10 am
by kotterbein
Nagios.cfg:

Code: Select all

# MODIFIED
admin_email=root@localhost
admin_pager=root@localhost
translate_passive_host_checks=1
log_event_handlers=0
use_large_installation_tweaks=1
enable_environment_macros=0


# NDOUtils module
broker_module=/usr/local/nagios/bin/ndomod.o config_file=/usr/local/nagios/etc/n
domod.cfg
broker_module=/usr/lib64/mod_gearman/mod_gearman.o config=/etc/mod_gearman/mod_g
earman_neb.conf

# PNP settings - bulk mode with NCPD
process_performance_data=1
# service performance data
service_perfdata_file=/usr/local/nagios/var/service-perfdata
service_perfdata_file_template=DATATYPE::SERVICEPERFDATA\tTIMET::$TIMET$\tHOSTNA
ME::$HOSTNAME$\tSERVICEDESC::$SERVICEDESC$\tSERVICEPERFDATA::$SERVICEPERFDATA$\t
SERVICECHECKCOMMAND::$SERVICECHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYP
E::$HOSTSTATETYPE$\tSERVICESTATE::$SERVICESTATE$\tSERVICESTATETYPE::$SERVICESTATETYPE$\tSERVICEOUTPUT::$SERVICEOUTPUT$
service_perfdata_file_mode=a
service_perfdata_file_processing_interval=15
service_perfdata_file_processing_command=process-service-perfdata-file-bulk
# host performance data
host_perfdata_file=/usr/local/nagios/var/host-perfdata
host_perfdata_file_template=DATATYPE::HOSTPERFDATA\tTIMET::$TIMET$\tHOSTNAME::$HOSTNAME$\tHOSTPERFDATA::$HOSTPERFDATA$\tHOSTCHE
CKCOMMAND::$HOSTCHECKCOMMAND$\tHOSTSTATE::$HOSTSTATE$\tHOSTSTATETYPE::$HOSTSTATETYPE$\tHOSTOUTPUT::$HOSTOUTPUT$
host_perfdata_file_mode=a
host_perfdata_file_processing_interval=15
host_perfdata_file_processing_command=process-host-perfdata-file-bulk


# OBJECTS - UNMODIFIED
#cfg_file=/usr/local/nagios/etc/objects/commands.cfg
#cfg_file=/usr/local/nagios/etc/objects/contacts.cfg
#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg
#cfg_file=/usr/local/nagios/etc/objects/templates.cfg
#cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg


# STATIC OBJECT DEFINITIONS (THESE DON'T GET EXPORTED/IMPORTED BY NAGIOSQL)
cfg_dir=/usr/local/nagios/etc/static

# OBJECTS EXPORTED FROM NAGIOSQL
cfg_file=/usr/local/nagios/etc/contacttemplates.cfg
cfg_file=/usr/local/nagios/etc/contactgroups.cfg
cfg_file=/usr/local/nagios/etc/contacts.cfg
cfg_file=/usr/local/nagios/etc/timeperiods.cfg
cfg_file=/usr/local/nagios/etc/commands.cfg
cfg_file=/usr/local/nagios/etc/hostgroups.cfg
cfg_file=/usr/local/nagios/etc/servicegroups.cfg
cfg_file=/usr/local/nagios/etc/hosttemplates.cfg
cfg_file=/usr/local/nagios/etc/servicetemplates.cfg
cfg_file=/usr/local/nagios/etc/servicedependencies.cfg
cfg_file=/usr/local/nagios/etc/serviceescalations.cfg
cfg_file=/usr/local/nagios/etc/hostdependencies.cfg
cfg_file=/usr/local/nagios/etc/hostescalations.cfg
cfg_file=/usr/local/nagios/etc/hostextinfo.cfg
cfg_file=/usr/local/nagios/etc/serviceextinfo.cfg
cfg_dir=/usr/local/nagios/etc/hosts
cfg_dir=/usr/local/nagios/etc/services

# GLOBAL EVENT HANDLERS
global_host_event_handler=xi_host_event_handler
global_service_event_handler=xi_service_event_handler



# UNMODIFIED
accept_passive_host_checks=1
accept_passive_service_checks=1
additional_freshness_latency=15
auto_reschedule_checks=0
auto_rescheduling_interval=30
auto_rescheduling_window=180
bare_update_check=0
cached_host_check_horizon=15
cached_service_check_horizon=15
check_external_commands=1
check_for_orphaned_hosts=1
check_for_orphaned_services=1
check_for_updates=1
check_host_freshness=0
check_result_path=/usr/local/nagios/var/spool/checkresults
check_result_reaper_frequency=2
check_service_freshness=1
command_check_interval=-1
command_file=/usr/local/nagios/var/rw/nagios.cmd
daemon_dumps_core=0
date_format=us
debug_file=/usr/local/nagios/var/nagios.debug
debug_level=0
debug_verbosity=1
enable_embedded_perl=1
enable_event_handlers=1
enable_flap_detection=1
enable_notifications=1
enable_predictive_host_dependency_checks=1
enable_predictive_service_dependency_checks=1
event_broker_options=-1
event_handler_timeout=30
execute_host_checks=1
execute_service_checks=1
external_command_buffer_slots=4096
high_host_flap_threshold=20.0
high_service_flap_threshold=20.0
host_check_timeout=30
host_freshness_check_interval=60
host_inter_check_delay_method=n
illegal_macro_output_chars=`~$&|'"<>
illegal_object_name_chars=`~!$%^&*|'"<>?,()=
interval_length=1
lock_file=/usr/local/nagios/var/nagios.lock
log_archive_path=/usr/local/nagios/var/archives
log_external_commands=0
log_file=/usr/local/nagios/var/nagios.log
log_host_retries=1
log_initial_states=0
log_notifications=1
log_passive_checks=0
log_rotation_method=d
log_service_retries=1
low_host_flap_threshold=5.0
low_service_flap_threshold=5.0
max_check_result_file_age=3600
max_check_result_reaper_time=10
max_concurrent_checks=0
max_debug_file_size=1000000
max_host_check_spread=30
max_service_check_spread=30
nagios_group=nagios
nagios_user=nagios
notification_timeout=30
object_cache_file=/ramdisk/objects.cache
obsess_over_hosts=1
obsess_over_services=1
ocsp_timeout=5
p1_file=/usr/local/nagios/bin/p1.pl
passive_host_checks_are_soft=0
perfdata_timeout=5
precached_object_file=/usr/local/nagios/var/objects.precache
resource_file=/usr/local/nagios/etc/resource.cfg
retained_contact_host_attribute_mask=0
retained_contact_service_attribute_mask=0
retained_host_attribute_mask=0
retained_process_host_attribute_mask=0
retained_process_service_attribute_mask=0
retained_service_attribute_mask=0
retain_state_information=1
retention_update_interval=60
service_check_timeout=60
service_freshness_check_interval=60
service_inter_check_delay_method=n
service_interleave_factor=s
sleep_time=0.1
soft_state_dependencies=0
state_retention_file=/usr/local/nagios/var/retention.dat
status_file=/ramdisk/status.dat
status_update_interval=10
temp_file=/usr/local/nagios/var/nagios.tmp
temp_path=/tmp
use_aggressive_host_checking=0
use_embedded_perl_implicitly=1
use_regexp_matching=0
use_retained_program_state=1
use_retained_scheduling_info=1
use_syslog=1
use_true_regexp_matching=0
output:

Code: Select all

[root@ etc]# ll /usr/local/nagios/var/spool/perfdata/| wc -l
1
[root@ etc]# ll /usr/local/nagios/var/spool/xidpe|wc -l
15431
[root@ etc]#
however I think this is where we are pooling perfdata:

Code: Select all

[root@ perfdata]# pwd
/ramdisk/spool/perfdata
[root@ perfdata]# ls -al
total 41540
drwxrwxr-x 2 nagios nagios       80 Feb 11 13:29 .
drwxr-xr-x 5 nagios nagios      100 Feb 11 11:23 ..
-rw-r--r-- 1 nagios nagios  6358930 Feb 11 13:29 host-perfdata.1360607346
-rw-r--r-- 1 nagios nagios 36076073 Mar  4 11:08 service-perfdata.1360607346


Re: Blank Performance Reports

Posted: Mon Mar 04, 2013 11:16 am
by abrist
Could you tail the following logs and post the results in a code wrap as well? Looks like your XI box is not reaping perfdata from the checkresults.

Code: Select all

tail -50 /usr/local/nagios/var/perfdata.log
tail -50 /usr/local/nagios/var/npcd.log

Re: Blank Performance Reports

Posted: Mon Mar 04, 2013 11:20 am
by kotterbein
tail -50 /usr/local/nagios/var/perfdata.log

Code: Select all

2013-02-11 11:19:13 [12300] [0] *** Timeout while processing Host: "ic-sur01c" Service: "Total_CPU_Util"
2013-02-11 11:19:13 [12300] [0] *** process_perfdata.pl terminated on signal ALRM
2013-02-11 11:20:07 [13466] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-02-11 11:20:07 [13466] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-02-11 11:20:07 [13466] [0] *** TIMEOUT: Please check your npcd.cfg
2013-02-11 11:20:07 [13466] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1360365780-PID-13466 deleted
2013-02-11 11:20:07 [13466] [0] *** Timeout while processing Host: "td-lss01" Service: "Check__var_log_messages"
2013-02-11 11:20:07 [13466] [0] *** process_perfdata.pl terminated on signal ALRM
2013-02-11 11:21:10 [14710] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-02-11 11:21:10 [14710] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-02-11 11:21:10 [14710] [0] *** TIMEOUT: Please check your npcd.cfg
2013-02-11 11:21:10 [14710] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1360365780-PID-14710 deleted
2013-02-11 11:21:10 [14710] [0] *** Timeout while processing Host: "pc-mdcx02" Service: "Memory_Util"
2013-02-11 11:21:10 [14710] [0] *** process_perfdata.pl terminated on signal ALRM
2013-02-11 11:22:09 [15942] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-02-11 11:22:09 [15942] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-02-11 11:22:09 [15942] [0] *** TIMEOUT: Please check your npcd.cfg
2013-02-11 11:22:09 [15942] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1360365780-PID-15942 deleted
2013-02-11 11:22:09 [15942] [0] *** Timeout while processing Host: "pc-mgw61" Service: "Swap_Usage"
2013-02-11 11:22:09 [15942] [0] *** process_perfdata.pl terminated on signal ALRM
2013-02-11 11:23:11 [17152] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-02-11 11:23:11 [17152] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-02-11 11:23:11 [17152] [0] *** TIMEOUT: Please check your npcd.cfg
2013-02-11 11:23:11 [17152] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1360365780-PID-17152 deleted
2013-02-11 11:23:11 [17152] [0] *** Timeout while processing Host: "pc-lvc05" Service: "ISE.MDCX.LVC2_ODS.03p_Msg_per_Sec"
2013-02-11 11:23:11 [17152] [0] *** process_perfdata.pl terminated on signal ALRM
2013-02-11 11:24:20 [19037] [0] *** TIMEOUT: Timeout after 5 secs. ***
2013-02-11 11:24:20 [19037] [0] *** TIMEOUT: Deleting current file to avoid NPCD loops
2013-02-11 11:24:20 [19037] [0] *** TIMEOUT: Please check your npcd.cfg
2013-02-11 11:24:20 [19037] [0] *** TIMEOUT: /usr/local/nagios/var/spool/perfdata//service-perfdata.1360365780-PID-19037 deleted
2013-02-11 11:24:20 [19037] [0] *** Timeout while processing Host: "pc-mgw17" Service: "CPU_Util"
2013-02-11 11:24:20 [19037] [0] *** process_perfdata.pl terminated on signal ALRM
tail -50 /usr/local/nagios/var/npcd.log

Code: Select all

10.580000/10.000000 at i=0[02-28-2013 20:16:40] NPCD: WARN: MAX load reached: load 14.280000/10.000000 at i=1[02-28-2013 20:16:55] NPCD: WARN: MAX load reached: load 12.660000/10.000000 at i=1[02-28-2013 20:17:10] NPCD: WARN: MAX load reached: load 11.100000/10.000000 at i=1[02-28-2013 20:28:40] NPCD: WARN: MAX load reached: load 11.520000/10.000000 at i=0[02-28-2013 20:35:10] NPCD: WARN: MAX load reached: load 12.070000/10.000000 at i=0[02-28-2013 20:35:40] NPCD: WARN: MAX load reached: load 10.560000/10.000000 at i=0[02-28-2013 20:38:40] NPCD: WARN: MAX load reached: load 10.680000/10.000000 at i=0[02-28-2013 21:21:25] NPCD: WARN: MAX load reached: load 10.590000/10.000000 at i=0[02-28-2013 21:21:40] NPCD: WARN: MAX load reached: load 12.400000/10.000000 at i=1[02-28-2013 21:21:55] NPCD: WARN: MAX load reached: load 12.760000/10.000000 at i=1[02-28-2013 21:22:10] NPCD: WARN: MAX load reached: load 10.530000/10.000000 at i=1[03-01-2013 04:56:12] NPCD: WARN: MAX load reached: load 10.120000/10.000000 at i=0[03-01-2013 04:56:27] NPCD: WARN: MAX load reached: load 13.220000/10.000000 at i=1[03-01-2013 04:56:43] NPCD: WARN: MAX load reached: load 15.010000/10.000000 at i=1[03-01-2013 04:56:58] NPCD: WARN: MAX load reached: load 13.040000/10.000000 at i=1[03-01-2013 04:57:13] NPCD: WARN: MAX load reached: load 10.780000/10.000000 at i=1[03-01-2013 06:01:43] NPCD: WARN: MAX load reached: load 11.380000/10.000000 at i=0[03-01-2013 07:06:43] NPCD: WARN: MAX load reached: load 10.400000/10.000000 at i=0[03-01-2013 09:16:44] NPCD: WARN: MAX load reached: load 10.570000/10.000000 at i=0[03-01-2013 10:21:14] NPCD: WARN: MAX load reached: load 10.660000/10.000000 at i=0[03-01-2013 10:21:29] NPCD: WARN: MAX load reached: load 11.830000/10.000000 at i=1[03-01-2013 10:21:44] NPCD: WARN: MAX load reached: load 12.970000/10.000000 at i=1[03-01-2013 10:21:59] NPCD: WARN: MAX load reached: load 10.590000/10.000000 at i=1[03-01-2013 11:26:45] NPCD: WARN: MAX load reached: load 10.090000/10.000000 at i=0[03-01-2013 13:36:45] NPCD: WARN: MAX load reached: load 10.600000/10.000000 at i=0[03-01-2013 14:41:30] NPCD: WARN: MAX load reached: load 11.080000/10.000000 at i=0[03-01-2013 14:41:45] NPCD: WARN: MAX load reached: load 12.670000/10.000000 at i=1[03-01-2013 14:42:00] NPCD: WARN: MAX load reached: load 11.500000/10.000000 at i=1[03-01-2013 14:42:15] NPCD: WARN: MAX load reached: load 10.590000/10.000000 at i=1[03-01-2013 15:46:46] NPCD: WARN: MAX load reached: load 10.260000/10.000000 at i=0[03-01-2013 19:01:32] NPCD: WARN: MAX load reached: load 11.890000/10.000000 at i=0[03-01-2013 19:01:47] NPCD: WARN: MAX load reached: load 13.600000/10.000000 at i=1[03-01-2013 19:02:02] NPCD: WARN: MAX load reached: load 11.700000/10.000000 at i=1[03-01-2013 22:01:18] NPCD: WARN: MAX load reached: load 10.760000/10.000000 at i=0[03-01-2013 22:01:33] NPCD: WARN: MAX load reached: load 10.280000/10.000000 at i=1[03-02-2013 00:26:18] NPCD: WARN: MAX load reached: load 10.030000/10.000000 at i=0[03-02-2013 00:26:33] NPCD: WARN: MAX load reached: load 12.460000/10.000000 at i=1[03-02-2013 00:26:48] NPCD: WARN: MAX load reached: load 12.960000/10.000000 at i=1[03-02-2013 00:27:03] NPCD: WARN: MAX load reached: load 10.330000/10.000000 at i=1[03-02-2013 01:31:34] NPCD: WARN: MAX load reached: load 10.980000/10.000000 at i=0[03-02-2013 01:31:50] NPCD: WARN: MAX load reached: load 10.430000/10.000000 at i=1[03-02-2013 02:36:50] NPCD: WARN: MAX load reached: load 10.390000/10.000000 at i=0[03-02-2013 06:56:21] NPCD: WARN: MAX load reached: load 11.480000/10.000000 at i=0[03-02-2013 06:56:36] NPCD: WARN: MAX load reached: load 11.400000/10.000000 at i=1[03-02-2013 06:56:51] NPCD: WARN: MAX load reached: load 10.230000/10.000000 at i=1[03-02-2013 12:16:38] NPCD: WARN: MAX load reached: load 10.520000/10.000000 at i=0[03-02-2013 17:42:09] NPCD: WARN: MAX load reached: load 10.430000/10.000000 at i=0[03-02-2013 19:51:25] NPCD: WARN: MAX load reached: load 10.010000/10.000000 at i=0[03-03-2013 16:21:31] NPCD: WARN: MAX load reached: load 10.360000/10.000000 at i=0[03-03-2013 16:21:46] NPCD: WARN: MAX load reached: load 10.010000/10.000000 at i=1[03-04-2013 03:11:19] NPCD: WARN: MAX load reached: load 10.430000/10.000000 at i=0[03-04-2013 03:11:34] NPCD: WARN: MAX load reached: load 12.300000/10.000000 at i=1[03-04-2013 03:11:49] NPCD: WARN: MAX load reached: load 11.110000/10.000000 at i=1[03-04-2013 03:12:04] NPCD: WARN: MAX load reached: load 10.220000/10.000000 at i=1[03-04-2013 10:46:21] NPCD: WARN: MAX load reached: load 10.910000/10.000000 at i=0[03-04-2013 10:46:36] NPCD: WARN: MAX load reached: load 10.340000/10.000000 at i=1[root@ic-nag01 pe

Re: Blank Performance Reports

Posted: Mon Mar 04, 2013 11:41 am
by abrist
It looks like you are hitting the max load and timeout settings for performance data processing:

In the file: /usr/local/nagios/etc/pnp/process_perfdata.cfg
Change:

Code: Select all

TIMEOUT = 5
to:

Code: Select all

TIMEOUT = 15
In the file: /usr/local/nagios/etc/pnp/npcd.cfg
Change:

Code: Select all

load_threshold = 10.0
to:

Code: Select all

load_threshold = 30.0
Restart npcd:

Code: Select all

service npcd restart
Wait 10 minutes or so and run slansing's commands again and post the outputs here. You may notice a substantial increase in load and IO while the XI server tries to process all of the previous checkresults.

Re: Blank Performance Reports

Posted: Mon Mar 04, 2013 12:06 pm
by kotterbein
still not updating-

I the only file updating seems to be:

Code: Select all

-rw-r--r-- 1 nagios nagios 36076073 Mar  4 12:04 service-perfdata.1360607346
this is set in /ramdisk/spool/perfdata, which I'm not sure why it would be updating there if the nagios.cfg file sets all perfdata info to /usr/local/nagios...

I'm going to reload nagios and see if it helps. nagios.cfg file is above.

Re: Blank Performance Reports

Posted: Mon Mar 04, 2013 12:16 pm
by slansing
What are the logs saying now?

Also, it may be worth it to read through the Ramdisk document again and compare to your settings as this can also be caused by missing a few commands here and there in the ramdisk creation process:

http://assets.nagios.com/downloads/nagi ... giosXI.pdf

Re: Blank Performance Reports

Posted: Mon Mar 04, 2013 12:55 pm
by kotterbein
I have gone back through and repeated the steps from the document sited above... I did have some outstanding configuration issues. one thing I did notice however also is that it has us change the following commands:

Code: Select all

command_name process-host-perfdata-file-bulk
command_line /bin/mv /var/nagiosramdisk/host-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.host
command_name process-service-perfdata-file-bulk
command_line /bin/mv /var/nagiosramdisk/service-perfdata /var/nagiosramdisk/spool/xidpe/$TIMET$.perfdata.service
there are two other commands I was curious if then needed to be augmented as well:

Code: Select all

command_name process-host-perfdata-file-pnp-bulk
command_line /bin/mv /usr/local/nagios/var/host-perfdata /usr/local/nagios/var/spool/perfdata/host-perfdata.$TIMET$
command_name process-service-perfdata-file-pnp-bulk
command_line /bin/mv /usr/local/nagios/var/service-perfdata /usr/local/nagios/var/spool/perfdata/service-perfdata.$TIMET$